
Lessons from Abeba Birhane
Cognitive scientist Abeba Birhane audits large-scale AI datasets to study how machine learning models affect marginalized communities. She demonstrated that increasing dataset size amplifies toxic content rather than diluting it, and argues we must judge technology by its effect on human relationships. This profile collects her perspectives on dataset curation, algorithmic accountability, and the ethical limits of artificial intelligence.
Part 1: Algorithmic Injustice and Harms
- On Systemic Injustice: "When algorithmic injustice and harm are brought to the fore, most of the solutions on offer (1) revolve around technical solutions and (2) do not center disproportionally impacted communities." — Source: [Patterns Journal00016-0.pdf)]
- On the Harm of Categorization: Classifying humans into rigid, machine-readable buckets often strips them of ambiguity and inflicts violence on marginalized identities. — Source: [The TWIML AI Podcast]
- On Re-framing Bias: The AI industry should move away from the sanitized term "bias" and instead use precise descriptors like "harm," "injustice," and "oppression" to reflect lived realities. — Source: [FAccT Conference]
- On Technological Fixes: Treating algorithmic harms as merely mathematical errors completely ignores the societal power structures that created them. — Source: [arXiv]
- On Disproportionate Impact: Vulnerable and minoritized groups bear the brunt of algorithmic failures, while privileged groups retain the agency to opt out or challenge the system. — Source: [Silicon Republic]
- On Defining AI Ethics: True AI ethics cannot be achieved through a consequentialist, Western lens; it requires centering the communities most impacted by automation. — Source: [Stanford HAI]
- On Lived Experiences: Quantitative data can never fully capture the lived experiences of those who suffer under algorithmic oppression. — Source: [Patterns Journal]
- On Predictive Policing: Automated systems deployed in criminal justice often merely launder historical prejudices through the veneer of high-tech objectivity. — Source: [arXiv]
- On Structural Inequalities: Algorithmic harm is rarely a discrete incident; it is a continuation of historical and asymmetrical power hierarchies. — Source: [Stanford HAI]
- On Moving Beyond Fairness: "Algorithmic fairness" frameworks are insufficient when the underlying systems themselves are designed to maximize profit at the expense of human dignity. — Source: [ResearchGate]
Part 2: Dataset Auditing and The Fallacy of Scale
- On the Scaling Hypothesis: "We should take the claim that scale makes everything better with a big bucket of salt. Because what our research has shown is that scale actually makes things worse, if what you are aiming for is a just, equitable, and fair model." — Source: [TIME]
- On Hate in Datasets: "We wanted to test the hypothesis that as you scale up, your problems disappear... We found that as datasets scale, hateful content also scales." — Source: [TIME]
- On Uncurated Data: The reigning sentiment of "scale the model, scale the data, scale the compute" ignores the severe consequences of training models on uncurated internet dumps. — Source: [arXiv]
- On Amplifying Harm: "Some in the field claim scale is a solution to bias and discrimination — a way to drown out the noise. But this research shows the polar opposite is true: Scale only degrades the datasets further, amplifying bias and causing real-world harm." — Source: [Mozilla Foundation]
- On Data Swamps: Datasets like C4 and LAION are essentially "data swamps," inheriting and perpetuating the worst biases, stereotypes, and toxicities found on the web. — Source: [arXiv]
- On the Toll of Auditing: "Most of the time, my screen is not safe for work. I used to love working in a café; now I can’t." — Source: [TIME]
- On Inadequate Filters: Automated filtering mechanisms, such as CLIP, are woefully insufficient at removing slurs, pornography, and aggressive text from massive training corpora. — Source: [arXiv]
- On Foundational Flaws: "Data sets are foundational. If there is a problem with the data sets, there is a problem with AI." — Source: [MacArthur Foundation]
- On Consent and Scraping: The mass harvesting of data to build large models fundamentally violates the consent and privacy of individuals across the globe. — Source: [arXiv]
- On Auditing as Intervention: Rigorous documentation and independent auditing of training data are critical, non-negotiable interventions for responsible AI development. — Source: [ResearchGate]
Part 3: Relational Ethics and Interconnectedness
- On Human Nature: "Basically, you cannot treat cognition or the person as something that exists on its own island, but something that is always interactive, relational..." — Source: [UCD Profile]
- On Moral Relationality: Personhood is not defined by individual isolation but by an indeterminable relationship with others and a continuous wave of relations. — Source: [Radical AI Podcast]
- On the Möbius Strip Metaphor: "Similarly, I believe, dichotomous approaches or efforts to neatly separate inextricably linked concepts such as nature and nurture, object and subject, mind and body, person and world, online and 'real', are hopeless." — Source: [Abeba's Blog]
- On Relational Ethics: AI ethics must shift from individualistic, rational frameworks toward a relational understanding that centers the community and our obligations to one another. — Source: [arXiv]
- On Fluid Entanglements: Attempting to cleanly sever humans from their environment is a futile endeavor; we must instead embrace fluid and dynamic entanglements. — Source: [Abeba's Blog]
- On Community over Individual: Western philosophical frameworks often fail AI ethics because they prioritize the autonomous individual over the health of the collective. — Source: [The TWIML AI Podcast]
- On Shared Responsibility: If humans are inherently interconnected, then the harms caused by AI to one marginalized group are a tear in the fabric of society as a whole. — Source: [Radical AI Podcast]
- On Ubuntu Philosophy: African philosophical concepts like Ubuntu—"I am because we are"—offer a vital corrective to the hyper-individualism embedded in modern tech development. — Source: [Chatham House]
- On the Limits of Rationality: Purely rational approaches to ethics fall short when dealing with complex, adaptive human systems that are defined by their social ties. — Source: [Stanford HAI]
- On Empathy in Design: Recognizing our relational nature demands that technologists build systems with deep empathy and respect for societal interconnectedness. — Source: [Patterns Journal]
Part 4: Embodied Cognition and Reductionism
- On Situated Cognition: Cognition does not happen exclusively within the brain; it is deeply situated within the world and relies on our physical bodies and environments. — Source: [Scribd Archive]
- On the Orthodox View: The traditional cognitive science approach of reducing the mind to a mere information-processing machine is a flawed and limiting framework. — Source: [Radical AI Podcast]
- On Human Unpredictability: Humans are ambiguous, indeterminable, and inherently unpredictable, which is why static machine learning models consistently fail to capture human nuance. — Source: [NIH PubMed]
- On Complex Adaptive Systems: People must be understood as complex adaptive systems, constantly changing in response to their physical and social surroundings. — Source: [UCD Profile]
- On the Illusion of Measurement: You cannot perfectly quantify human behavior; algorithms that claim to predict criminality or emotion are peddling pseudoscience. — Source: [Stanford HAI]
- On Biological Reality: The body is not merely a vehicle for the brain; physical interactions with the world are the very foundation of how we think and learn. — Source: [Abeba's Blog]
- On Rejecting Dichotomies: The rigid separation of mind and body is an outdated Cartesian artifact that continues to misguide artificial intelligence research. — Source: [Abeba's Blog]
- On the Danger of Reductionism: When we reduce human beings to data points in a spreadsheet, we lose the context necessary to treat them with dignity. — Source: [arXiv]
- On the Limits of AI Simulation: Because AI lacks a body and a social environment, it cannot achieve true human-like cognition or understanding. — Source: [Ologies Podcast]
Part 5: Algorithmic Colonization
- On Modern Dominance: While "traditional colonialism seeks unilateral power and domination over colonized people" through physical force, "colonialism in the age of AI takes the form of 'state-of-the-art algorithms' and 'AI solutions' to social problems." — Source: [Oxford Academic]
- On Digital Extraction: Western tech monopolies harvest data from the Global South, creating extractive loops that mirror historical colonial resource extraction. — Source: [ETC Group]
- On Imposed Solutions: Exporting Western AI models to African nations often results in tools that are unfit for local problems, suffocating indigenous innovation. — Source: [Silicon Republic]
- On Tech Dependency: The influx of Silicon Valley infrastructure into developing nations fosters a dangerous cycle of dependency on foreign algorithms. — Source: [Oxford Academic]
- On Privilege and Agency: "The more privileged you are, the more agency you have to decide how and in what way it impacts you and what you can avoid." — Source: [Silicon Republic]
- On Decolonizing AI: Decoloniality in tech is not a metaphor; it requires actively shifting power and ownership back to local communities. — Source: [Chatham House]
- On Local Value Systems: AI systems deployed in the Global South must be built from the ground up, centering local community needs and values rather than foreign corporate interests. — Source: [Radical AI Podcast]
- On Epistemic Injustice: When Western algorithms define what is true or normal, they commit epistemic violence against marginalized cultures. — Source: [arXiv]
- On the Guise of Progress: Technological imperialism often disguises itself as humanitarian aid or developmental progress, masking its true extractive nature. — Source: [Oxford Academic]
Part 6: The Illusion of AI Objectivity
- On the Veneer of Math: Machine learning models often impose a veneer of objectivity that successfully hides the subjective, human biases embedded within their code. — Source: [Stanford HAI]
- On Automating Ambiguity: The concept of automating human ambiguity is inherently impossible; attempting to do so simply forces humans into unnatural, reductionist categories. — Source: [arXiv]
- On Data as History: Data is never neutral. It is a historical artifact that reflects the inequalities, prejudices, and power dynamics of the society that produced it. — Source: [arXiv]
- On Neutral Algorithms: The persistent belief that algorithms are purely mathematical and therefore unbiased is one of the most dangerous myths in modern computer science. — Source: [ResearchGate]
- On Subjectivity in Design: Every step of AI development—from dataset curation to hyperparameter tuning—is laden with subjective human choices. — Source: [Scribd Archive]
- On Mathematical Washing: Using complex mathematics to justify discriminatory outcomes is a form of math-washing that shields developers from accountability. — Source: [FAccT Conference]
- On the Ground Truth Fallacy: In social contexts, there is rarely a singular ground truth for an algorithm to find; optimizing for one often silences competing, valid perspectives. — Source: [arXiv]
- On Replicating the Past: Because models learn from historical data, an objective AI is merely one that perfectly replicates the injustices of the past. — Source: [Patterns Journal00016-0.pdf)]
- On Questioning Output: Society must normalize a deep, systemic skepticism toward any machine output that claims to offer an objective assessment of human character. — Source: [ETC Group]
Part 7: Accountability and Meaningful Regulation
- On Meaningful Consequences: "To tackle the discrimination perpetuated by AI systems, then, we need meaningful and enforceable regulations alongside intentional auditing: regulations that ensure companies and governments face appropriate consequences when they fail to adequately protect the public." — Source: [Alan Turing Institute]
- On Corporate PR: "‘AI for Good’, but good for Whom? Good PR for big tech corporations? Good for laundering accountability?" — Source: [AI Accountability Lab]
- On Shifting the Burden: The burden of proof regarding AI safety must shift from the victims of algorithmic harm to the corporations deploying the systems. — Source: [Ologies Podcast]
- On Embedded Systems: Because AI is inherently intertwined with critical societal infrastructures, voluntary industry guidelines are vastly insufficient; strict legal guardrails are required. — Source: [MacArthur Foundation]
- On Ethics vs. Regulation: Ethics without enforcement is often co-opted by tech monopolies to stall or dilute actual government regulation. — Source: [Research Ireland]
- On Auditing Standards: Independent, third-party dataset auditing must become a standardized, legally mandated requirement before models are released to the public. — Source: [arXiv]
- On Rejecting Inevitability: We must reject the tech-industry narrative that AI deployment is an unstoppable force of nature; it is a human-directed process that can be regulated. — Source: [TIME]
- On Empowering Communities: True accountability means giving impacted communities the legal mechanisms to audit, contest, and dismantle harmful algorithms. — Source: [AI Accountability Lab]
- On Structural Interventions: Technical tweaks will never fix systemic harm; we need structural interventions that address the profit motives driving reckless AI development. — Source: [Patterns Journal]
Part 8: The Future of Responsible AI
- On Human Dignity: "A future where human dignity, justice, and rights guide AI development is possible. Continual dialogues and meaningful measures for accountability in the development and deployment of AI systems is central to achieving this." — Source: [Research Ireland]
- On Environmental Impact: A responsible AI future must urgently address the massive carbon footprint and ecological devastation caused by training massive models. — Source: [Ologies Podcast]
- On Robot Rights: Debating robot rights is a dangerous distraction from the very real, present-day human rights abuses perpetrated by algorithmic systems. — Source: [Radical AI Podcast]
- On Refusal: Sometimes the most ethical approach to designing an AI system for a sensitive social context is to refuse to build it at all. — Source: [Stanford HAI]
- On Interdisciplinary Collaboration: The future of tech requires computer scientists to step outside their silos and engage deeply with sociologists, philosophers, and critical race theorists. — Source: [UCD Profile]
- On Slower Tech: The industry must abandon the mantra of moving fast and breaking things in favor of slow, deliberate, and deeply consultative development. — Source: [Chatham House]
- On Redefining Progress: Technological progress should not be measured by the size of a model's parameter count, but by its ability to demonstrably improve the lives of the most vulnerable. — Source: [FAccT Conference]
- On Participatory Design: The communities most likely to be harmed by a system must have a central voice in its design, deployment, and governance. — Source: [arXiv]
- On Hope and Friction: Creating friction against rapid, unchecked AI deployment is an act of hope—it is the assertion that a more equitable technological future is still within our grasp. — Source: [TIME]