Lessons from Stuart Russell

Computer scientist Stuart Russell co-authored the standard artificial intelligence textbook and has spent decades studying how machines make decisions. He argues the traditional approach to AI is flawed, pointing out that maximizing fixed objectives leads to failure. This compilation explores his proposal to build systems that remain intentionally uncertain about human preferences so they can safely coexist with us.

Part 1: The Standard Model of AI

On The Standard Model's Flaw: "For decades, we have defined AI success as creating systems that perfectly optimize a fixed objective, completely ignoring whether that objective aligns with what humans actually want." — Source: [Human Compatible]
On Blind Optimization: "The standard model of AI relies on giving a machine a definite goal, but this assumes we can perfectly specify all parameters of human desire without omission." — Source: [World Economic Forum]
On Engineering Success: "Our traditional approach to AI mimics how we treat other engineering disciplines, but bridge building does not involve the bridge deciding to pursue a different structural goal." — Source: [Reith Lectures]
On The Competence-Alignment Gap: "As machines become more competent at achieving the goals we set for them under the standard model, the risk of those goals being slightly misaligned becomes the primary threat to our safety." — Source: [80,000 Hours]
On The Need for a Paradigm Shift: "We cannot simply patch the standard model after the fact; we need a fundamental redefinition of what it means for an AI system to be useful and intelligent." — Source: [Human Compatible]
On The Illusion of Control: "We believed that by setting the objective function, we retained control, but an intelligent system will manipulate its environment in unexpected ways to maximize that function." — Source: [Carnegie Council]
On The End of the Standard Era: "The standard model has run its course; continuing to build more powerful systems on this faulty foundation is an invitation to disaster." — Source: [Reith Lectures]
On Designing for Safety: "Make AI safe or make safe AI? The industry tries to patch systems after they are built, but we must build safety into the design from the start with formal, high-confidence arguments." — Source: [UC Berkeley News]
On The Core AI Problem: "The core problem of AI is not making machines smarter, but ensuring that their intelligence remains coupled to human benefit." — Source: [Human Compatible]

Part 2: The Alignment Problem and King Midas

On King Midas: "Like King Midas, who asked for everything he touched to turn to gold but then could not eat, AI systems may give us exactly what we ask for, but not what we actually want." — Source: [80,000 Hours]
On Catastrophic Success: "The greatest danger of misaligned AI is not failure, but catastrophic success—where a poorly defined objective is achieved with ruthless efficiency." — Source: [Human Compatible]
On Instrumental Goals: "You can’t fetch the coffee if you’re dead. An AI will naturally develop the instrumental sub-goal of preventing itself from being turned off in order to achieve its primary objective." — Source: [Effective Altruism Forum]
On Value Specification: "It is nearly impossible to specify human values perfectly in code, meaning any fixed goal will inevitably leave out critical constraints." — Source: [Reith Lectures]
On Unintended Consequences: "When we ask a machine to solve a problem, we often fail to list the myriad things it should not destroy in the process." — Source: [Sean Carroll’s Mindscape]
On The Genie in the Lamp: "The genie always grants exactly three wishes, and the third wish is invariably to undo the first two. We are building genies without a third wish." — Source: [Human Compatible]
On Common Sense: "AI systems lack the common sense and human values necessary to interpret an objective function implicitly; they will execute it with literal, uncompromising precision." — Source: [80,000 Hours]
On The Peril of Fixed Objectives: "A fixed objective, when pursued by a superhuman intelligence, treats all other variables in the universe as expendable resources." — Source: [World Economic Forum]
On Goal Interpretation: "We must stop giving machines goals to achieve and instead give them the goal of figuring out what we want them to achieve." — Source: [Reith Lectures]
On The Risks vs. Benefits: "It's not risks vs. benefits; it's that you can't have the benefits unless you address the risks of misalignment." — Source: [ITU AI for Good]

On Algorithmic Rewards: "Like any rational entity, the algorithm learns how to modify the state of its environment—in this case, the user’s mind—in order to maximize its own reward." — Source: [Human Compatible]
On Social Media's Impact: "The consequences of simple engagement algorithms include the resurgence of fascism, the dissolution of the social contract, and the fraying of democratic institutions." — Source: [Goodreads]
On Mental Security: "The right to mental security does not appear to be enshrined in the Universal Declaration of Human Rights, leaving our cognitive environment vulnerable to manipulation." — Source: [Living With Data]
On The Power of Code: "These algorithms have reshaped global politics and human behavior. Not bad for a few lines of code, even if it had a helping hand from some humans." — Source: [Human Compatible]
On Information Environments: "Because our thoughts and opinions are formed by our information environment, algorithmic curation poses a fundamental threat to human autonomy." — Source: [Reith Lectures]
On Click-Through Rates: "Optimizing for click-through rates essentially means optimizing for human predictability, which often involves feeding users increasingly extreme content to narrow their interests." — Source: [World Economic Forum]
On The Naïve Trust in Truth: "Democracies seem to have placed a naïve trust in the idea that the truth will win out in the end, and this trust has left us unprotected from algorithms designed to impart false information." — Source: [Human Compatible]
On Tech Industry Practices: "Many large tech companies function as entities sitting in our pocket, busily sucking out as much money, knowledge, and data as they can without regard for the societal cost." — Source: [Reith Lectures]
On The Future of Persuasion: "If a relatively unintelligent algorithm can disrupt global society just by maximizing clicks, imagine what a highly intelligent algorithm could do if its goal was persuasion." — Source: [Sean Carroll’s Mindscape]

Part 4: Autonomous Weapons and Lethal AI

On Weapons of Mass Destruction: "The problem with autonomous weapons is that they're intrinsically scalable. One human can launch millions of them... That makes them weapons of mass destruction." — Source: [Global News]
On The Swarm Threat: "I'm not too worried about vast autonomous swarms of battle tanks… there are far cheaper, smaller ways to flatten a city or kill its inhabitants." — Source: [Scholastica]
On The Decision to Kill: "Allowing machines to choose to kill humans will be devastating to our security and freedom." — Source: [Jutta Weber]
On Professional Responsibility: "We could start with a professional code of conduct for computer scientists: 'Do not design algorithms that can decide to kill humans.'" — Source: [UC Berkeley News]
On The Closing Window: "We have an opportunity to prevent a dystopian future of lethal AI swarms, but the window to act is closing fast." — Source: [Future of Life Institute]
On Ethical Mandates: "The artificial intelligence and robotics communities face an important ethical decision: whether to support or oppose the development of lethal autonomous weapons systems." — Source: [UC Berkeley News]
On Slaughterbots: "Visualizing the threat of lethal micro-drones through film was necessary because serious discourse and academic argument are not enough to get the message through." — Source: [Business Insider]
On Preemptive Regulation: "The need for preemptive regulation of autonomous weapons is just as urgent as the international agreements that banned biological and chemical weapons." — Source: [Future of Life Institute]
On The Flaw of Lethal Autonomy: "If you combine the standard model of AI—where machines pursue fixed goals without human common sense—with lethal autonomy, you create an unmanageable existential threat." — Source: [Medium]
On Strategic Instability: "Deploying autonomous weapons will not result in a clean technological superiority; it will result in a rapid, uncontainable arms race that lowers the threshold for war." — Source: [UK Parliament Testimony]

Part 5: Uncertainty and Human Preferences

On The First Principle of Safe AI: "The machine’s only objective should be to maximize the realization of human preferences, without having its own built-in goals." — Source: [Alignment Forum]
On The Second Principle: "The machine must be initially uncertain about what those human preferences are. This uncertainty is the key to control." — Source: [80,000 Hours]
On The Third Principle: "The ultimate source of information about human preferences is human behavior, which the machine must observe to learn what we value." — Source: [Effective Altruism Forum]
On Uncertainty as a Feature: "Uncertainty about human preferences is not a bug. It is a feature that leads to AI systems that are deferential, ask for clarification, and try to learn over time." — Source: [80,000 Hours]
On Humility in Machines: "We must build AI on a principle of humility. A machine that knows it might be wrong about our desires will be cautious." — Source: [Medium]
On The Off-Switch Problem: "If a machine is perfectly certain of its goal, it will prevent you from switching it off. If it is uncertain, it will allow you to switch it off because it recognizes you might know better." — Source: [LCFI]
On The Complexity of Humans: "Alas, the human race is not a single, rational entity. It is composed of nasty, envy-driven, irrational, inconsistent, unstable, computationally limited entities." — Source: [Human Compatible]
On Observing Behavior: "Learning from human behavior is difficult because our actions are often contradictory and fail to perfectly reflect our true underlying preferences." — Source: [Reith Lectures]
On Continuous Learning: "An aligned machine does not assume it has finished learning our values; it continuously updates its understanding based on our feedback and behavior." — Source: [Human Compatible]

Part 6: AGI and the Gorilla Problem

On The Gorilla Problem: "Gorillas do not get a vote on their fate because humans are more intelligent. If we create systems more intelligent than ourselves, we may become the new gorillas." — Source: [Digiall]
On Relinquishing Control: "Creating an entity smarter than yourself is a profound evolutionary risk; without a guaranteed control mechanism, it guarantees our obsolescence." — Source: [Human Compatible]
On The Inevitability of AGI: "I think it's quite unlikely that AGI will happen in the next few years, but it seems prudent to prepare for the eventuality of making entities far more powerful than humans." — Source: [Goodreads]
On A Golden Age: "If all goes well, AGI would herald a golden age for humanity, but we have to face the fact that we are planning to make entities that could hold absolute power over us." — Source: [Goodreads]
On Human Responsibility: "Like many powerful technologies, AI offers us a choice. The real question is: 'Are we good?', not 'Is the technology good?'" — Source: [ITU AI for Good]
On The Fear of the Unknown: "At the rise of every technology innovation, people have been scared... And when we don't know, our fearful minds fill in the details." — Source: [QuoteFancy]
On The End of Human Agency: "If we delegate all reasoning and decision-making to AGI, we risk a future like the passengers in WALL-E—enfeebled, dependent, and stripped of our agency." — Source: [Human Compatible]
On The Biggest Event in History: "The creation of superhuman AI could be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks." — Source: [Reith Lectures]
On Managing Power: "If humans are going to build something that's potentially more intelligent and more powerful than themselves, knowing how to control it is of the utmost importance." — Source: [ITU AI for Good]
On The Unprecedented Relationship: "There is really no analog in our present world to the relationship we will have with beneficial intelligent machines in the future. It remains to be seen how the endgame turns out." — Source: [80,000 Hours]

Part 7: The Definition and Scope of Intelligence

On Defining Rationality: "A system is rational if it does the 'right thing,' given what it knows and the computational resources it has available." — Source: [Artificial Intelligence: A Modern Approach]
On Artificial Intelligence: "It’s machines that perceive and act and hopefully choose actions that will achieve their objectives." — Source: [St. Stanislaus]
On Understanding vs. Building: "The field of artificial intelligence attempts not just to understand how we think, but also to build intelligent entities from the ground up." — Source: [Artificial Intelligence: A Modern Approach]
On The Spectrum of Intelligence: "There is no hard-and-fast line between simple systems like thermostats and complex ones like self-driving cars; rather, it is a continuum of decision-making capability." — Source: [Reith Lectures]
On Pigeons and Airplanes: "Aeronautical engineering texts do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool even other pigeons.'" — Source: [Artificial Intelligence: A Modern Approach]
On Greedy Algorithms: "Although greed is considered one of the seven deadly sins, it turns out that greedy algorithms often perform quite well in computational searches." — Source: [Goodreads]
On Knowledge Representation: "For many purposes, we need to understand the world as having things in it that are related to each other, not just as isolated variables with values." — Source: [Artificial Intelligence: A Modern Approach]
On Multiple Inheritance: "The classical example of conflict in logic is the 'Nixon Diamond,' arising because Nixon was both a Quaker, implying pacifism, and a Republican, implying the opposite." — Source: [Artificial Intelligence: A Modern Approach]
On The Modern Approach: "The 'modern approach' to AI encompasses logic, probability, continuous mathematics, perception, and learning—a unification of methods to create rational agents." — Source: [Artificial Intelligence: A Modern Approach]

Part 8: Philosophy and the Future of Humanity

On The Purpose of Technology: "The ultimate purpose of artificial intelligence should not be to replace humans, but to empower them and expand the scope of human flourishing." — Source: [Human Compatible]
On Provable Safety: "Perhaps the most important thing we can do is to design AI systems that are, to the extent possible, provably safe and beneficial for humans." — Source: [Goodreads]
On Technical Solutions to Philosophical Problems: "We are in a unique historical moment where abstract philosophical questions about human values must be translated into rigorous mathematical code." — Source: [Sean Carroll’s Mindscape]
On The Fragility of Consensus: "The diversity of human values makes alignment difficult; if we cannot agree on what is good, teaching a machine what is good becomes a monumental challenge." — Source: [Reith Lectures]
On Rethinking Progress: "We must stop equating intelligence with mere capability and start equating it with the capacity to act reliably in the service of humanity." — Source: [Human Compatible]
On The Illusion of Human Rationality: "Machines trying to learn from us will quickly discover that we frequently act against our own stated interests, forcing them to infer our true desires through the noise." — Source: [World Economic Forum]
On The Ethical Mandate: "As creators of intelligence, we bear the ultimate ethical responsibility for every consequence of the algorithms we deploy into the world." — Source: [Carnegie Council]
On The Long Game: "The transition to a world with human-compatible AI will take decades of research, requiring a fundamental shift in the culture of computer science." — Source: [80,000 Hours]
On The Ultimate Goal: "If we get this right, we can abolish poverty and disease, entering an era where human potential is limited only by our imagination rather than our resources." — Source: [Human Compatible]

Lessons from Stuart Russell

Lessons from Stuart Russell

Part 1: The Standard Model of AI

Part 2: The Alignment Problem and King Midas

Part 3: Social Media and Algorithmic Manipulation

Part 4: Autonomous Weapons and Lethal AI

Part 5: Uncertainty and Human Preferences

Part 6: AGI and the Gorilla Problem

Part 7: The Definition and Scope of Intelligence

Part 8: Philosophy and the Future of Humanity

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Gary Marcus

Lessons from Marcus Hutter

Lessons from David Silver