Lessons from Aman Khan

Aman Khan is the Head of Product at Arize AI and a former product leader at Spotify, Apple, and Cruise. He argues that product managers must abandon informal "vibe checks" and adopt systematic evaluation frameworks to test AI models and build reliable software. This profile covers his approach to AI product development, non-deterministic design, and rapid prototyping.

Part 1: Breaking into AI Product Management

On AI PM Fundamentals: "To break into AI product management, you must shift your focus from writing detailed feature specs to deeply understanding model capabilities and failure modes." — Source: Lenny's Podcast
On Technical Intuition: "You do not need a machine learning background to be an AI PM, but you do need enough technical intuition to know when an engineering constraint is real versus when it is just a lack of tooling." — Source: Aman Khan's Substack
On the Role of the PM: "In traditional software, PMs define the user journey. In AI software, PMs define the acceptable bounds of unpredictability." — Source: Product Growth Podcast
On Building Intuition: "The best way to build AI intuition is to build your own prototypes. Stop reading about LLMs and start breaking them." — Source: Lenny's Newsletter
On Transitioning Roles: "If you want to move into AI product, start by finding an AI problem within your current domain rather than trying to jump to an AI infrastructure company immediately." — Source: Lenny's Podcast
On Continuous Learning: "The half-life of knowledge in AI is incredibly short. Focus on learning the underlying concepts of evaluation and observability rather than memorizing today's state-of-the-art model benchmarks." — Source: Supra Insider
On User Empathy: "AI introduces friction when the model gets it wrong. Your job as a PM is to design the fallback experience, not just the happy path." — Source: Product Growth Podcast
On Cross-Functional Collaboration: "AI PMs must sit closer to the data science and ML engineering teams than ever before. You cannot treat the model as a black box." — Source: Mind the Product
On Defining Success: "Success in an AI product is rarely a binary metric. It is usually a distribution of outcomes that you have to continuously measure and improve." — Source: Arize AI Blog

Part 2: Moving Beyond "Vibe Checks"

On the Trap of Vibe Checks: "Vibe checks are when you look at five outputs, nod, and ship it. This is the fastest way to introduce silent failures into production." — Source: Lenny's Newsletter
On Scaling Quality: "You cannot scale a product if your quality assurance relies on a human reading every output to see if it 'feels right'." — Source: Product Growth Podcast
On Regression Testing: "Vibe checks completely fail to catch regressions. A prompt tweak might improve one edge case but degrade performance across a hundred others." — Source: Arize AI Blog
On Systematic Measurement: "Moving beyond vibe checks means defining specific, measurable criteria for what 'good' looks like and automating the testing of those criteria." — Source: Supra Insider
On the Illusion of Competence: "LLMs are incredibly good at sounding confident while being entirely wrong. Vibe checks fall prey to this illusion of competence." — Source: Lenny's Podcast
On Developer Velocity: "When you rely on vibe checks, developer velocity slows down because engineers are afraid to make changes without breaking the unknown." — Source: Aman Khan's Substack
On Stakeholder Trust: "You cannot build stakeholder trust in an AI product based on vibes. You need hard data showing precision, recall, and safety bounds." — Source: Mind the Product
On Early Stage vs. Production: "Vibe checks are fine for the first 24 hours of exploring an idea. They are unacceptable for the day you release it to users." — Source: Maven AI Courses
On the True Cost: "The hidden cost of vibe checks is the technical debt you accumulate in user trust. Once users learn your AI is unreliable, they stop using it." — Source: Product Growth Podcast

Part 3: Evaluations as the New PRD

On Product Requirements: "Evals are the new PRD. If you cannot write an evaluation for a feature, you do not understand the feature well enough to build it." — Source: Lenny's Newsletter
On Aligning Teams: "A shared evaluation set is the single best tool for aligning product and engineering teams on what the model is supposed to do." — Source: Arize AI Blog
On Ground Truth Data: "The hardest part of building evals is not the code; it is defining the ground truth data that represents user intent." — Source: Supra Insider
On LLMs as Evaluators: "Using LLMs to evaluate other LLMs is powerful, but it requires as much tuning and testing as the primary application model." — Source: Learning from Machine Learning Podcast
On Continuous Improvement: "An eval is not a one-time test. It is a living document that should grow every time you discover a new failure mode in production." — Source: Aman Khan's Substack
On Specificity: "Good evals test for specific behaviors like tone, factual accuracy, format constraints, and safety guidelines. Broad evals are useless." — Source: Lenny's Podcast
On the Feedback Loop: "The speed at which you can run evals and get results determines the iteration speed of your entire AI team." — Source: Product Growth Podcast
On Prioritization: "When deciding what to build next, look at your eval failure rates. Your biggest product opportunities are hidden in the queries your model consistently fails." — Source: Arize AI Blog
On Accountability: "Evals hold the model accountable to the user experience. They translate ambiguous user needs into concrete technical thresholds." — Source: Mind the Product

Part 4: Developing AI Product Sense

On Defining AI Product Sense: "AI product sense is the ability to predict how a non-deterministic system will behave in the wild and designing an interface that accommodates that behavior." — Source: Maven AI Courses
On Model Capabilities: "You develop product sense by intimately understanding what models are good at (synthesis, formatting) and what they are bad at (logic, math, spatial reasoning)." — Source: Lenny's Newsletter
On the Personal OS: "Build a personal operating system using AI tools. Automating your own workflows is the fastest way to understand the friction points users face." — Source: Aman Khan's Substack
On Latency vs. Quality: "A core part of AI product sense is knowing when users will accept higher latency for a better answer versus when they need an immediate, if simpler, response." — Source: Product Growth Podcast
On the Chat Interface: "Chat is a fallback interface. Good AI product sense means designing specific workflows where the AI operates behind the scenes." — Source: Supra Insider
On Data Flywheels: "An AI product must get better as people use it. If your product does not capture data to improve the next interaction, you lack AI product sense." — Source: Arize AI Blog
On Managing Expectations: "You have to design the product in a way that sets the right expectations. If it looks like a search bar, people will treat it like Google." — Source: Lenny's Podcast
On Edge Cases: "In traditional software, edge cases are rare. In generative AI, edge cases are the default state. You must design for them." — Source: Mind the Product
On Problem Selection: "The most important product decision is choosing a problem where the model's error rate is acceptable to the end user." — Source: Learning from Machine Learning Podcast

Part 5: Non-Deterministic Design Principles

On Determinism: "Users expect software to behave the same way every time they click a button. Generative AI breaks this expectation by design." — Source: Learning from Machine Learning Podcast
On Graceful Degradation: "Design your system so that when the AI fails, it fails gracefully. Provide standard UI fallbacks rather than a broken experience." — Source: Maven AI Courses
On User Correction: "Build interfaces that make it easy for users to correct the AI. This fixes the immediate problem and provides data for your evals." — Source: Aman Khan's Substack
On Transparency: "Do not hide the AI. Tell the user when a response is generated by a model so they adjust their trust levels accordingly." — Source: Lenny's Podcast
On Steerability: "Users need to feel in control. Give them knobs and dials to steer the AI's output rather than forcing them to guess the right prompt." — Source: Product Growth Podcast
On Handling Hallucinations: "Treat hallucinations as a UI problem, not just a model problem. How do you design an interface that encourages the user to verify facts?" — Source: Arize AI Blog
On the Illusion of Choice: "Sometimes offering fewer options generated by AI is better than an open-ended chat that paralyzes the user." — Source: Supra Insider
On Statefulness: "Non-deterministic systems feel more reliable when they maintain context across sessions. Statefulness covers a lot of model flaws." — Source: Mind the Product
On Feedback Mechanisms: "A thumbs up or thumbs down button is rarely enough. Capture the specific text the user edited to understand why the model failed." — Source: Lenny's Newsletter

Part 6: Observability and the ML Lifecycle

On Production Monitoring: "Shipping an AI feature without observability is like flying a plane blindfolded. You need to know what users are asking and what the model is answering." — Source: Arize AI Blog
On Data Drift: "Models degrade over time because user behavior changes. Observability tools help you catch data drift before it impacts retention." — Source: Learning from Machine Learning Podcast
On Tracing: "When an LLM chain fails, you need tracing to see which specific step (retrieval, prompt construction, or generation) caused the error." — Source: Product Growth Podcast
On Prompt Management: "Treat prompts as code. They need version control, testing, and monitoring in production just like any other software component." — Source: Supra Insider
On Cost Visibility: "LLM costs can scale exponentially. Good observability means tracking token usage and cost on a per-feature and per-user basis." — Source: Aman Khan's Substack
On Safety and Abuse: "Monitoring is your first line of defense against prompt injection and malicious use. You cannot stop it if you cannot see it." — Source: Lenny's Podcast
On Closing the Loop: "The ML lifecycle is only complete when production data flows back into your evaluation sets to improve the next iteration." — Source: Maven AI Courses
On Root Cause Analysis: "When a user reports a bad response, observability allows you to replay the exact context and prompt that generated it." — Source: Arize AI Blog
On Metric Selection: "Do not just monitor latency and error rates. Monitor semantic similarity to expected answers and tone consistency." — Source: Mind the Product
On Proactive Intervention: "The goal of observability is to detect a spike in hallucinations and roll back a prompt change before users start complaining." — Source: Product Growth Podcast

Part 7: Prototyping with AI Agents

On Cursor and Claude: "Tools like Cursor and Claude Code have fundamentally changed the PM role. PMs can now build functional prototypes without waiting for engineering resources." — Source: Maven AI Courses
On Rapid Iteration: "The ability to prototype an AI feature in an afternoon allows you to test the feeling internally before committing to a rigid development cycle." — Source: Aman Khan's Substack
On Understanding Constraints: "When you build it yourself with an AI coding agent, you visceralize the latency and context window limitations. It makes you a better PM." — Source: Lenny's Newsletter
On Communicating with Engineers: "Handing an engineer a working, albeit messy, prototype built in Cursor is infinitely better than handing them a ten-page text document." — Source: Supra Insider
On Testing Hypotheses: "Use AI agents to quickly test whether a specific LLM can even perform the task you want before designing the entire UI around it." — Source: Product Growth Podcast
On Unlocking Creativity: "When the barrier to writing code drops, PMs can explore weird, experimental ideas that would never get prioritized in a sprint planning meeting." — Source: Learning from Machine Learning Podcast
On the PM Toolkit: "Cursor is not just an engineering tool; it is a product discovery tool. It should be open on your desk alongside Figma and Jira." — Source: Mind the Product
On Learning by Doing: "You cannot learn prompt engineering by reading about it. You have to write code, call the API, and see the JSON break." — Source: Lenny's Podcast
On Agentic Workflows: "Building prototypes with agents teaches you how to break complex tasks into smaller, sequential steps, which is exactly how you need to design AI products." — Source: Aman Khan's Substack
On Shifting Paradigms: "We are moving from PMs defining what to build, to PMs building the first draft themselves." — Source: Arize AI Blog

Part 8: Lessons from Spotify, Apple, and Cruise

On Scale and Precision: "Working on autonomous vehicles at Cruise teaches you that 99 percent accuracy is completely useless if the 1 percent results in a catastrophic failure. Evals matter." — Source: Lenny's Podcast
On Personalization: "Spotify mastered the art of algorithmic curation. The lesson is that the AI must feel deeply personal, but the interface must remain simple." — Source: Product Growth Podcast
On Consumer Expectations: "Consumers do not care about the underlying model architecture. They care about whether the Apple feature solves their problem elegantly and privately." — Source: Aman Khan's Substack
On Hardware and Software Integration: "The best AI experiences are tightly coupled with the hardware and the native OS constraints, not just an API wrapper." — Source: Supra Insider
On Feedback Loops: "Discover Weekly works because the feedback loop is frictionless. Listening to a song or skipping it trains the model without explicit user effort." — Source: Mind the Product
On Safety Critical Systems: "In autonomous driving, testing is the product. This mindset must be applied to generative AI if we want it to be used in enterprise environments." — Source: Learning from Machine Learning Podcast
On Data Infrastructure: "You cannot build good machine learning models without pristine data pipelines. The plumbing is often more important than the algorithm." — Source: Arize AI Blog
On Cross-Disciplinary Teams: "Building great AI requires musicians, designers, data scientists, and engineers sitting in the same room. Isolation breeds poor products." — Source: Lenny's Newsletter
On Launching AI: "When launching an AI feature, start with a narrow, highly constrained use case where you can guarantee the outcome before expanding the scope." — Source: Product Growth Podcast
On Career Trajectory: "The common thread across these tech companies is the shift toward systematic evaluation. Master evals, and you can build AI products anywhere." — Source: Maven AI Courses

Lessons from Aman Khan

Lessons from Aman Khan

Part 1: Breaking into AI Product Management

Part 2: Moving Beyond "Vibe Checks"

Part 3: Evaluations as the New PRD

Part 4: Developing AI Product Sense

Part 5: Non-Deterministic Design Principles

Part 6: Observability and the ML Lifecycle

Part 7: Prototyping with AI Agents

Part 8: Lessons from Spotify, Apple, and Cruise

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Maya Spivak

Lessons from David Lieb

Lessons from Adrian McDermott