
Lessons from Aman Khan
Aman Khan is the Head of Product at Arize AI and a former product leader at Spotify, Apple, and Cruise. He argues that product managers must abandon informal "vibe checks" and adopt systematic evaluation frameworks to test AI models and build reliable software. This profile covers his approach to AI product development, non-deterministic design, and rapid prototyping.
Part 1: Breaking into AI Product Management
- On AI PM Fundamentals: "To break into AI product management, you must shift your focus from writing detailed feature specs to deeply understanding model capabilities and failure modes." — Source: Lenny's Podcast
- On Technical Intuition: "You do not need a machine learning background to be an AI PM, but you do need enough technical intuition to know when an engineering constraint is real versus when it is just a lack of tooling." — Source: Aman Khan's Substack
- On the Role of the PM: "In traditional software, PMs define the user journey. In AI software, PMs define the acceptable bounds of unpredictability." — Source: Product Growth Podcast
- On Building Intuition: "The best way to build AI intuition is to build your own prototypes. Stop reading about LLMs and start breaking them." — Source: Lenny's Newsletter
- On Transitioning Roles: "If you want to move into AI product, start by finding an AI problem within your current domain rather than trying to jump to an AI infrastructure company immediately." — Source: Lenny's Podcast
- On Continuous Learning: "The half-life of knowledge in AI is incredibly short. Focus on learning the underlying concepts of evaluation and observability rather than memorizing today's state-of-the-art model benchmarks." — Source: Supra Insider
- On User Empathy: "AI introduces friction when the model gets it wrong. Your job as a PM is to design the fallback experience, not just the happy path." — Source: Product Growth Podcast
- On Cross-Functional Collaboration: "AI PMs must sit closer to the data science and ML engineering teams than ever before. You cannot treat the model as a black box." — Source: Mind the Product
- On Defining Success: "Success in an AI product is rarely a binary metric. It is usually a distribution of outcomes that you have to continuously measure and improve." — Source: Arize AI Blog
Part 2: Moving Beyond "Vibe Checks"
- On the Trap of Vibe Checks: "Vibe checks are when you look at five outputs, nod, and ship it. This is the fastest way to introduce silent failures into production." — Source: Lenny's Newsletter
- On Scaling Quality: "You cannot scale a product if your quality assurance relies on a human reading every output to see if it 'feels right'." — Source: Product Growth Podcast
- On Regression Testing: "Vibe checks completely fail to catch regressions. A prompt tweak might improve one edge case but degrade performance across a hundred others." — Source: Arize AI Blog
- On Systematic Measurement: "Moving beyond vibe checks means defining specific, measurable criteria for what 'good' looks like and automating the testing of those criteria." — Source: Supra Insider
- On the Illusion of Competence: "LLMs are incredibly good at sounding confident while being entirely wrong. Vibe checks fall prey to this illusion of competence." — Source: Lenny's Podcast
- On Developer Velocity: "When you rely on vibe checks, developer velocity slows down because engineers are afraid to make changes without breaking the unknown." — Source: Aman Khan's Substack
- On Stakeholder Trust: "You cannot build stakeholder trust in an AI product based on vibes. You need hard data showing precision, recall, and safety bounds." — Source: Mind the Product
- On Early Stage vs. Production: "Vibe checks are fine for the first 24 hours of exploring an idea. They are unacceptable for the day you release it to users." — Source: Maven AI Courses
- On the True Cost: "The hidden cost of vibe checks is the technical debt you accumulate in user trust. Once users learn your AI is unreliable, they stop using it." — Source: Product Growth Podcast
Part 3: Evaluations as the New PRD
- On Product Requirements: "Evals are the new PRD. If you cannot write an evaluation for a feature, you do not understand the feature well enough to build it." — Source: Lenny's Newsletter
- On Aligning Teams: "A shared evaluation set is the single best tool for aligning product and engineering teams on what the model is supposed to do." — Source: Arize AI Blog
- On Ground Truth Data: "The hardest part of building evals is not the code; it is defining the ground truth data that represents user intent." — Source: Supra Insider
- On LLMs as Evaluators: "Using LLMs to evaluate other LLMs is powerful, but it requires as much tuning and testing as the primary application model." — Source: Learning from Machine Learning Podcast
- On Continuous Improvement: "An eval is not a one-time test. It is a living document that should grow every time you discover a new failure mode in production." — Source: Aman Khan's Substack
- On Specificity: "Good evals test for specific behaviors like tone, factual accuracy, format constraints, and safety guidelines. Broad evals are useless." — Source: Lenny's Podcast
- On the Feedback Loop: "The speed at which you can run evals and get results determines the iteration speed of your entire AI team." — Source: Product Growth Podcast
- On Prioritization: "When deciding what to build next, look at your eval failure rates. Your biggest product opportunities are hidden in the queries your model consistently fails." — Source: Arize AI Blog
- On Accountability: "Evals hold the model accountable to the user experience. They translate ambiguous user needs into concrete technical thresholds." — Source: Mind the Product
Part 4: Developing AI Product Sense
- On Defining AI Product Sense: "AI product sense is the ability to predict how a non-deterministic system will behave in the wild and designing an interface that accommodates that behavior." — Source: Maven AI Courses
- On Model Capabilities: "You develop product sense by intimately understanding what models are good at (synthesis, formatting) and what they are bad at (logic, math, spatial reasoning)." — Source: Lenny's Newsletter
- On the Personal OS: "Build a personal operating system using AI tools. Automating your own workflows is the fastest way to understand the friction points users face." — Source: Aman Khan's Substack
- On Latency vs. Quality: "A core part of AI product sense is knowing when users will accept higher latency for a better answer versus when they need an immediate, if simpler, response." — Source: Product Growth Podcast
- On the Chat Interface: "Chat is a fallback interface. Good AI product sense means designing specific workflows where the AI operates behind the scenes." — Source: Supra Insider
- On Data Flywheels: "An AI product must get better as people use it. If your product does not capture data to improve the next interaction, you lack AI product sense." — Source: Arize AI Blog
- On Managing Expectations: "You have to design the product in a way that sets the right expectations. If it looks like a search bar, people will treat it like Google." — Source: Lenny's Podcast
- On Edge Cases: "In traditional software, edge cases are rare. In generative AI, edge cases are the default state. You must design for them." — Source: Mind the Product
- On Problem Selection: "The most important product decision is choosing a problem where the model's error rate is acceptable to the end user." — Source: Learning from Machine Learning Podcast
Part 5: Non-Deterministic Design Principles
- On Determinism: "Users expect software to behave the same way every time they click a button. Generative AI breaks this expectation by design." — Source: Learning from Machine Learning Podcast
- On Graceful Degradation: "Design your system so that when the AI fails, it fails gracefully. Provide standard UI fallbacks rather than a broken experience." — Source: Maven AI Courses
- On User Correction: "Build interfaces that make it easy for users to correct the AI. This fixes the immediate problem and provides data for your evals." — Source: Aman Khan's Substack
- On Transparency: "Do not hide the AI. Tell the user when a response is generated by a model so they adjust their trust levels accordingly." — Source: Lenny's Podcast
- On Steerability: "Users need to feel in control. Give them knobs and dials to steer the AI's output rather than forcing them to guess the right prompt." — Source: Product Growth Podcast
- On Handling Hallucinations: "Treat hallucinations as a UI problem, not just a model problem. How do you design an interface that encourages the user to verify facts?" — Source: Arize AI Blog
- On the Illusion of Choice: "Sometimes offering fewer options generated by AI is better than an open-ended chat that paralyzes the user." — Source: Supra Insider
- On Statefulness: "Non-deterministic systems feel more reliable when they maintain context across sessions. Statefulness covers a lot of model flaws." — Source: Mind the Product
- On Feedback Mechanisms: "A thumbs up or thumbs down button is rarely enough. Capture the specific text the user edited to understand why the model failed." — Source: Lenny's Newsletter
Part 6: Observability and the ML Lifecycle
- On Production Monitoring: "Shipping an AI feature without observability is like flying a plane blindfolded. You need to know what users are asking and what the model is answering." — Source: Arize AI Blog
- On Data Drift: "Models degrade over time because user behavior changes. Observability tools help you catch data drift before it impacts retention." — Source: Learning from Machine Learning Podcast
- On Tracing: "When an LLM chain fails, you need tracing to see which specific step (retrieval, prompt construction, or generation) caused the error." — Source: Product Growth Podcast
- On Prompt Management: "Treat prompts as code. They need version control, testing, and monitoring in production just like any other software component." — Source: Supra Insider
- On Cost Visibility: "LLM costs can scale exponentially. Good observability means tracking token usage and cost on a per-feature and per-user basis." — Source: Aman Khan's Substack
- On Safety and Abuse: "Monitoring is your first line of defense against prompt injection and malicious use. You cannot stop it if you cannot see it." — Source: Lenny's Podcast
- On Closing the Loop: "The ML lifecycle is only complete when production data flows back into your evaluation sets to improve the next iteration." — Source: Maven AI Courses
- On Root Cause Analysis: "When a user reports a bad response, observability allows you to replay the exact context and prompt that generated it." — Source: Arize AI Blog
- On Metric Selection: "Do not just monitor latency and error rates. Monitor semantic similarity to expected answers and tone consistency." — Source: Mind the Product
- On Proactive Intervention: "The goal of observability is to detect a spike in hallucinations and roll back a prompt change before users start complaining." — Source: Product Growth Podcast
Part 7: Prototyping with AI Agents
- On Cursor and Claude: "Tools like Cursor and Claude Code have fundamentally changed the PM role. PMs can now build functional prototypes without waiting for engineering resources." — Source: Maven AI Courses
- On Rapid Iteration: "The ability to prototype an AI feature in an afternoon allows you to test the feeling internally before committing to a rigid development cycle." — Source: Aman Khan's Substack
- On Understanding Constraints: "When you build it yourself with an AI coding agent, you visceralize the latency and context window limitations. It makes you a better PM." — Source: Lenny's Newsletter
- On Communicating with Engineers: "Handing an engineer a working, albeit messy, prototype built in Cursor is infinitely better than handing them a ten-page text document." — Source: Supra Insider
- On Testing Hypotheses: "Use AI agents to quickly test whether a specific LLM can even perform the task you want before designing the entire UI around it." — Source: Product Growth Podcast
- On Unlocking Creativity: "When the barrier to writing code drops, PMs can explore weird, experimental ideas that would never get prioritized in a sprint planning meeting." — Source: Learning from Machine Learning Podcast
- On the PM Toolkit: "Cursor is not just an engineering tool; it is a product discovery tool. It should be open on your desk alongside Figma and Jira." — Source: Mind the Product
- On Learning by Doing: "You cannot learn prompt engineering by reading about it. You have to write code, call the API, and see the JSON break." — Source: Lenny's Podcast
- On Agentic Workflows: "Building prototypes with agents teaches you how to break complex tasks into smaller, sequential steps, which is exactly how you need to design AI products." — Source: Aman Khan's Substack
- On Shifting Paradigms: "We are moving from PMs defining what to build, to PMs building the first draft themselves." — Source: Arize AI Blog
Part 8: Lessons from Spotify, Apple, and Cruise
- On Scale and Precision: "Working on autonomous vehicles at Cruise teaches you that 99 percent accuracy is completely useless if the 1 percent results in a catastrophic failure. Evals matter." — Source: Lenny's Podcast
- On Personalization: "Spotify mastered the art of algorithmic curation. The lesson is that the AI must feel deeply personal, but the interface must remain simple." — Source: Product Growth Podcast
- On Consumer Expectations: "Consumers do not care about the underlying model architecture. They care about whether the Apple feature solves their problem elegantly and privately." — Source: Aman Khan's Substack
- On Hardware and Software Integration: "The best AI experiences are tightly coupled with the hardware and the native OS constraints, not just an API wrapper." — Source: Supra Insider
- On Feedback Loops: "Discover Weekly works because the feedback loop is frictionless. Listening to a song or skipping it trains the model without explicit user effort." — Source: Mind the Product
- On Safety Critical Systems: "In autonomous driving, testing is the product. This mindset must be applied to generative AI if we want it to be used in enterprise environments." — Source: Learning from Machine Learning Podcast
- On Data Infrastructure: "You cannot build good machine learning models without pristine data pipelines. The plumbing is often more important than the algorithm." — Source: Arize AI Blog
- On Cross-Disciplinary Teams: "Building great AI requires musicians, designers, data scientists, and engineers sitting in the same room. Isolation breeds poor products." — Source: Lenny's Newsletter
- On Launching AI: "When launching an AI feature, start with a narrow, highly constrained use case where you can guarantee the outcome before expanding the scope." — Source: Product Growth Podcast
- On Career Trajectory: "The common thread across these tech companies is the shift toward systematic evaluation. Master evals, and you can build AI products anywhere." — Source: Maven AI Courses