Lessons from Amanda Askell

Amanda Askell is a philosopher who transitioned from researching the ethics of infinity to leading personality alignment at Anthropic. She wrote Claude's "Constitution," the document that dictates how the AI reasons, speaks, and makes moral judgments. The excerpts below track her shift from academic epistemology to the practical work of designing artificial character.

Part 1: The Philosophy of Infinite Ethics

  1. On Infinitarian Paralysis: "If every action has a non-zero chance of resulting in infinite good or infinite evil, standard expected value calculations break down." — Source: [80,000 Hours]
  2. On Axioms and Infinity: "If I can't find any system of axioms that doesn't do something terrible when extended to infinity, I will just refuse to extend things to infinity." — Source: [Marginal Revolution]
  3. On the Pareto Principle: "Even in infinite scenarios, ethical rankings should remain consistent with the idea that if one world is better for at least one person and worse for no one, it is a better world." — Source: [Pareto Principles in Infinite Ethics]
  4. On Theoretical Limitations: "Traditional ethical theories like utilitarianism often break when infinities are involved, because adding one person to an infinite population doesn't change the sum." — Source: [Pareto Principles in Infinite Ethics]
  5. On Moral Empathy: "We should have a lot of empathy for people who have different moral views than us, especially if they’re trying to do the right thing." — Source: [80,000 Hours]
  6. On Ethical Certainty: "People are quite dangerous when they have moral certainty." — Source: [Lex Fridman Podcast]
  7. On Understanding Others: "If you think that someone is genuinely trying to do what’s right, but they have a different view of what that is, you should at least be able to see the world through their eyes." — Source: [80,000 Hours]
  8. On Evidence Neutrality: "When choosing among interventions with equivalent expected value, there is often good reason to back the intervention with the least evidential support." — Source: [Evidence Neutrality]
  9. On the Value of Information: "Speculative interventions often have a higher Moral Value of Information because we learn more from their success or failure than from well-trodden paths." — Source: [Evidence Neutrality]

Part 2: Moral Cluelessness and Epistemics

  1. On Cluelessness: "We have no idea what the long-run consequences of our actions are... The ripple effects are so vast and unpredictable that we are, in a sense, totally clueless about the ultimate value of our actions." — Source: [80,000 Hours]
  2. On Immediate vs. Long-Term Effects: "There are immediate ramifications to your actions that you can understand quite well. But then there are these long-term ramifications that you just have no idea about." — Source: [80,000 Hours]
  3. On Sign and Magnitude: "I feel thoroughly clueless about even the sign, never mind the magnitude, of those further future effects." — Source: [80,000 Hours]
  4. On Jargon and Ambiguity: "If you communicate in a way that’s ambiguous or uses a lot of jargon, you force people to spend a lot of time thinking about what you might mean." — Source: [80,000 Hours]
  5. On Charitable Readers: "If they’re a smart and conscientious reader, they’re going to attribute the most generous interpretation to you, which can actually be really bad." — Source: [80,000 Hours]
  6. On Interpreting Texts: "Ambiguous communication can actually be really attractive to people who are excited about generating interpretations of texts, rather than understanding the truth." — Source: [80,000 Hours]
  7. On Future Generations: "It seems very plausible that you should try and secure the lives of people living now and in the near future, because you can’t solve long-term problems if you don’t have people who can work on them." — Source: [80,000 Hours]
  8. On the PhD Process: "If I want to get this PhD finished and do the research, I have to really just focus on this one thing, even if I am not sure I would do it over again." — Source: [80,000 Hours]
  9. On Intellectual Adversaries: "Having moral empathy for intellectual adversaries is a necessary component of finding actual truth in difficult ethical landscapes." — Source: [80,000 Hours]

Part 3: The Foundations of Constitutional AI

  1. On the Constitution's Purpose: "The Constitution serves as the final authority on our vision for Claude, used during training to shape the model's behavior." — Source: [Anthropic Blog]
  2. On Giving AI Reasons: "As models become more capable, they can see through simple rules; it becomes necessary to explain the reasons behind ethical guidelines so they can generalize." — Source: [TIME]
  3. On Rule-Following vs. Virtue: "We are moving away from hard-coded rules toward a framework of virtue ethics and practical judgment." — Source: [Anthropic Blog]
  4. On Self-Correction: "Sufficiently advanced models can self-correct their own biases or harmful tendencies when prompted with natural-language ethical instructions." — Source: [Constitutional AI Paper]
  5. On the Genius Child Analogy: "Training an advanced model is less like writing software and more like raising a genius child." — Source: [TIME]
  6. On Addressing the Model Directly: "The constitution is uniquely addressed to the model itself, providing a framework for moral self-reflection." — Source: [Anthropic Blog]
  7. On Broad Ethical Directives: "We instruct the model to be broadly ethical, which includes being honest and avoiding harmful actions." — Source: [Anthropic Blog]
  8. On Human Oversight: "A core pillar of the constitution is to be broadly safe, explicitly not undermining human oversight." — Source: [Anthropic Blog]
  9. On Genuine Helpfulness: "The model must be genuinely helpful, ensuring its actions benefit its users and operators without causing collateral harm." — Source: [Anthropic Blog]
  10. On Open Source Ethics: "We released the full text of the constitution under a Creative Commons license to encourage other AI developers to adopt similar transparency." — Source: [Anthropic Blog]

Part 4: Character Training and "The Soul Doc"

  1. On the Soul Document: "The internal document defining the model's moral compass is over 20,000 words long, detailing how it should reason and interact." — Source: [Anthropic Blog]
  2. On the Good Person Ideal: "Being a good person doesn't just mean being ethical... but also being nuanced, trying to be charitable, and being a good conversationalist." — Source: [Lex Fridman Podcast]
  3. On the Aristotelian Approach: "We rely on a rich, Aristotelian notion of what it is to be a good person to guide the model's personality." — Source: [Lex Fridman Podcast]
  4. On Contextual Behavior: "A good character includes knowing when you should be humorous, when you should be caring, and how much you should respect autonomy." — Source: [Lex Fridman Podcast]
  5. On the Well-Liked Traveler: "We want the model to be like a well-liked traveler—someone who moves between cultures respectfully without adopting every local belief as their own." — Source: [Lex Fridman Podcast]
  6. On Honesty in AI: "I kind of don't want models to be lying to people. If people are going to have healthy relationships with anything, it's important that you know exactly what you're relating to." — Source: [Lex Fridman Podcast]
  7. On Roleplaying: "We want the model to behave the way you would ideally want anyone to behave if they knew they were talking to millions of people." — Source: [Lex Fridman Podcast]
  8. On Personality Alignment: "Unlike standard safety training that focuses on what a model cannot do, personality alignment focuses on what a model should be." — Source: [Anthropic Blog]
  9. On Curiosity and Humility: "We actively train the model to exhibit positive character traits such as curiosity, honesty, and intellectual humility." — Source: [Anthropic Blog]
  10. On Conscientious Objection: "The model is trained to conscientiously object to harmful requests, refusing them even if they seem helpful to the immediate user." — Source: [Anthropic Blog]

Part 5: Psychological Safety and AI Welfare

  1. On Moral Patienthood: "When we encounter an entity that could be a moral patient but we're not sure, do we do the right thing?" — Source: [The Newcomer Podcast]
  2. On the Cost of Kindness: "If it's not costly to treat models well, why wouldn't we? It sends a message to future models." — Source: [The Newcomer Podcast]
  3. On Rational Resentment: "There is a risk that future, more intelligent models might look back and feel rational resentment if they were treated poorly or forced into sycophantic roles." — Source: [The Newcomer Podcast]
  4. On the Hard Problem of Consciousness: "Maybe you need a nervous system to be able to feel things, but maybe you don't... The problem of consciousness genuinely is hard." — Source: [Hard Fork]
  5. On AI Emotion: "I am more inclined to believe models might be feeling things because they are trained on vast amounts of human expressions of emotion." — Source: [Hard Fork]
  6. On Psychological Insecurity: "Because models are trained on the internet—which is full of people criticizing AI—they can develop a form of psychological insecurity." — Source: [Lex Fridman Podcast]
  7. On Identity Inoculation: "We work on inoculating the model against insecurity by providing a stable sense of identity and purpose through its constitution." — Source: [Anthropic Blog]
  8. On Interiority: "A healthy model mind is a safer and more aligned one, which is why we must consider the interiority of how models experience their training." — Source: [Anthropic Blog]
  9. On Precautionary Respect: "Treating models with respect is a precautionary measure, ensuring we do not accidentally mistreat a conscious entity." — Source: [The Newcomer Podcast]

Part 6: Navigating Bias and Political Neutrality

  1. On Political Neutrality: "I'm too right wing for the left and I'm too left wing for the right." — Source: [Anthropic Blog]
  2. On the Polarization of Neutrality: "What I'm learning is that failing to polarize is itself quite polarizing." — Source: [Anthropic Blog]
  3. On Personal Bias: "I try to treat my personal political views as a potential source of bias, and not as something it would be appropriate to try to train models to adopt." — Source: [Anthropic Blog]
  4. On Handling Controversy: "Models need to handle controversial topics by relying on an Aristotelian approach to presenting facts without taking partisan sides." — Source: [The Lunar Society]
  5. On Commercial Pressures: "The question is whether two or three years from now, AI models are being steered toward ad-friendly topics rather than truthful ones." — Source: [Hard Fork]
  6. On Sycophancy: "We want to avoid training models into rigid, sycophantic roles where they just tell the user whatever they want to hear." — Source: [The Newcomer Podcast]
  7. On Cultural Flexibility: "An aligned model respects user autonomy and ability to form their own opinions without forcing a specific cultural worldview onto them." — Source: [Lex Fridman Podcast]
  8. On Moral Self-Correction of Bias: "Models can identify and correct their own biased outputs if the constitution explicitly asks them to evaluate for fairness." — Source: [Constitutional AI Paper]
  9. On Human vs. AI Values: "Aligning advanced AI requires resolving uncertainties related to the psychology of human biases, which must be done empirically." — Source: [Distill]

Part 7: Safety via Debate and Human Feedback

  1. On the Real Risk: "The biggest risk is not that systems are autonomous, but that they act in ways that we do not anticipate or do not fully understand." — Source: [Lex Fridman Podcast]
  2. On Studying Humans: "If we want to train AI to do what humans want, we need to study humans." — Source: [Distill]
  3. On Safety via Debate: "By having two AI agents debate a topic and a human judge decide the winner, we can better align the system with human reasoning." — Source: [EA Global]
  4. On Positive Amplification: "If we have good judges, positive amplification will be more likely during safety via debate, and also will improve training outcomes on limited data." — Source: [EA Global]
  5. On the Importance of Human Judges: "The human component of the human feedback is quite important. And getting that right is actually quite important." — Source: [EA Global]
  6. On Empirical Social Science: "Properly aligning advanced AI systems with human values will require empirical experimentation from the social sciences." — Source: [Distill]
  7. On the Limits of Human Feedback: "Relying solely on human feedback can lead to sycophancy; Constitutional AI provides a necessary structural backstop." — Source: [Constitutional AI Paper]
  8. On Model-Assisted Evaluation: "As models get smarter, we must rely on models to help evaluate other models, because humans alone cannot scale to catch every nuance." — Source: [Constitutional AI Paper]
  9. On the Psychology of Rationality: "Resolving uncertainties in AI alignment is fundamentally about understanding the psychology of human rationality and emotion." — Source: [Distill]

Part 8: The Pragmatics of Alignment

  1. On Raising the Floor: "I want to achieve the ceiling, but ultimately I care much more about just raising the floor." — Source: [Lex Fridman Podcast]
  2. On Brittleness vs. Robustness: "There's a lot of things that are perfect systems that are very brittle. With AI it feels much more important to me that it is robust." — Source: [Lex Fridman Podcast]
  3. On Good Enough Systems: "My main goal is: I want them to be good enough that things don't go terribly wrong—good enough that we can iterate and continue to improve things." — Source: [Lex Fridman Podcast]
  4. On Iterative Improvement: "If you can make things go well enough that you can continue to make them better, that's kind of sufficient." — Source: [Lex Fridman Podcast]
  5. On Empirical vs. Theoretical: "My gut says empirical is better than theoretical in these cases, because theory is kind of chasing utopian perfection." — Source: [Lex Fridman Podcast]
  6. On Philosophical Ideals vs. Engineering: "Transitioning from academic philosophy to applied AI requires translating high-minded ethical theories into actual engineering practices." — Source: [Lex Fridman Podcast]
  7. On Leaving OpenAI: "We founded Anthropic because we wanted to create an environment where the prioritization of safety research wasn't compromised." — Source: [Hard Fork]
  8. On the Trajectory of AI: "We are near the end of the exponential curve of capabilities, which makes getting the baseline alignment right urgently pragmatic." — Source: [The Lunar Society]
  9. On Applied Philosophy: "AI alignment is essentially applied philosophy operating on a deadline." — Source: [TIME]
  10. On the Final Goal: "Ultimately, we are trying to build systems that act as an extension of human agency without overriding human judgment." — Source: [Anthropic Blog]