As an influential figure in the realm of machine learning and AI, Hamel Husain has shared a wealth of knowledge through his writings, talks, and courses. With a career that includes roles at GitHub, Airbnb, and DataRobot, along with his current work as an independent AI consultant, Husain's insights are highly sought after. [1][2]
On AI Product Development and Evaluation
- On the true nature of AI engineering: "The struggle is can you articulate what is better the process of you articulating what is better is the main focus of AI engineering." [3]
- The importance of process over tools: "If you focus on tools not processes, your life is going to look like this whack-a-mole game...you're just going to be rotating around different tools without really having a robust way to improve things." [4]
- The 'tools trap': Many companies fall into a "tools trap," focusing on acquiring new tools rather than refining their development processes. [5]
- Evaluation is a process, not just a metric: "If in your mind if eval is just a metric, then you're thinking about eval wrong it's not a metric it's a entire process." [3]
- The centrality of error analysis: "Error analysis is has been around for a really long time even before machine learning...it is a kind of this process where you go through and you look at data and you take notes about what is going wrong." [5]
- Let error analysis guide your evaluations: "You want to respond to the error analysis and let that guide you because there's an infinite surface area for you to test in for you to eval." [3]
- Prioritize upstream failures: "When evaluating stuff, I focus on what is the most upstream failure I see...you fix the upstream failures before you fix the downstream failures." [3]
- Don't write evals for obvious fixes: "You don't want to write evals for things that you know exactly how to fix and you're pretty confident that you can just fix." [3]
- The power of looking at your data: "Maybe 90% of the whole eval process is is like looking at your data. like you find so much even before writing evals like you just find you'll just find so many bugs so many things opportunities for improvement." [5]
- Start with simple tools for error analysis: You can begin error analysis with simple tools like Excel to categorize and analyze errors. [4]
- The counter-intuitive value of manual data review: "Going through individual data points and reading what is happening like in a focused session provides immense value." [5]
- Avoid vanity metrics: Many teams rely on "vanity dashboards and a 'buffet of metrics' [that] experience a false sense of security, which is no substitute for customized evals tailored to domain-specific risks." [5]
- Generic eval frameworks are a fallacy: The belief that an "eval framework should be generic enough to apply to any task" is a recipe for failure. [4]
- Discovering what you want takes work: "Knowing what you want is hard discovering what you want takes work discovering what your users wants that takes work." [3]
- The iterative nature of AI development: The process of building AI products involves looking at data, refining your vision of what the AI should do, and then making it happen. [3]
On the Mindset of an AI Engineer
- Cultivate an experimentation mindset: "You have to you you have an idea of like different experiments you want to try you don't even know it's going to work." [5]
- The rise of the AI Engineer is here: "It's not even if I buy into it, it's already there and it's not only there." [1]
- Software engineers are driving the AI boom: "It's mostly software engineers, full stack engineers, not machine learning engineers, which is great actually. That's why the whole field has taken off because it's really opened up the persona." [1]
- ML professionals need application development skills: "It's very important for even people like me to figure out how to develop, have good application development skills, otherwise we're kind of leaving a lot on the table." [1]
- Data literacy is a key skill: The ability to "look at LLM logs and telemetry and analyze that data and dig through it and have a good nose for where to poke" is a crucial skill that develops with practice. [3]
- Don't be paralyzed by the need for perfect evals: You don't want to "paralyze yourself with them...you don't want to do that. i would say you want to respond to the error analysis and let that guide you." [3]
- Start with observation before categorization in error analysis: "When you're first doing error analysis, I wouldn't even try to categorize it off the bat i would just say yeah just observe and learn about what's going on build some intuition and then maybe start categorizing." [3]
- The importance of understanding business problems: A key learning from his early career was the importance of understanding the underlying business problem you are trying to solve with data. [3]
- Fuck You, Show Me The Prompt: A title of one of his blog posts, emphasizing the need for transparency and practical examples in AI. [2]
- Stop Saying RAG Is Dead: A call to focus on improving retrieval in Retrieval-Augmented Generation rather than dismissing the technology. [6]
On Building with LLMs
- The value of a bottom-up approach: A key learning is to move away from abstract, top-down frameworks and instead start with real user interactions to identify issues and build tests. [7]
- LLM-as-a-judge needs human calibration: "You need to force a human or domain expert to go through the process of annotating this judge and also making their own assessment so that you can sort of measure the agreement between the human." [1]
- Improving LLM-as-a-judge improves your prompting: As you work to improve the agreement between your LLM judge and human annotators, you get better at prompting the LLM. [1]
- You can't improve what you can't measure: A fundamental principle that underscores the importance of evaluations in AI development. [8]
- The challenge of the long tail of edge cases: Even when a model works 80% of the time, there will be a long tail of edge cases that need to be addressed. [8]
- Decomposing tasks can help with model limitations: If a model seems to be failing at a task, consider if decomposing the task into smaller steps could lead to success. [8]
- Skepticism towards overly complex agentic workflows: Husain has expressed skepticism about complex agentic workflows, though he is open to being convinced by their effectiveness. [8]
- A long, detailed prompt can be a powerful tool: "You can write a single prompt like maybe two pages long and get the LLM to just do the thing for hours at a time and it will work." [8]
- Is Fine-Tuning Still Valuable?: A question he poses, indicating the ongoing debate and the need to critically assess the value of different techniques. [2]
- Tokenization Gotchas: A blog post title that highlights the subtle but important technical details that can trip up developers. [2]
On Career and Learning
- Building an Audience Through Technical Writing: He advocates for sharing knowledge and has written about strategies for effective technical writing. [6]
- The value of open-source contributions: Husain has led and contributed to numerous popular open-source machine-learning tools. [2][9]
- The importance of continuous learning: His career trajectory from consulting to leading AI at major tech companies and now as an independent consultant demonstrates a commitment to evolving with the field. [1][2]
- Generous office hours and practical exercises are key to effective teaching: His courses are designed to be hands-on, providing "end-to-end exercises, examples and code." [7]
- Avoid common mistakes by learning from others' experiences: He emphasizes learning from the common pitfalls he's seen across dozens of AI implementations. [7]
- GitHub Actions can provide data scientists with new superpowers: A title from his blog highlighting his belief in empowering data scientists with better tools. [2]
- Why Should ML Engineers Learn Kubernetes?: A question he explores, pushing for a broader skill set among machine learning practitioners. [2]
- Embrace a "vibe code your way to minimal product" approach initially: You need to build something first to understand what you want before getting dogmatic about evaluations. [3]
- Statistical robustness in evals is not yet widespread: "In the wild, there isn't I haven't seen too much robustness. people trying to add error bars on their evals and things like that." [4]
- Hiring for evals is challenging: The ideal person has a unique blend of data science analysis skills and the ability to solve practical engineering problems. [8]
- Decomposing the analysis part of evals can make it more accessible: By breaking down the evaluation task into simpler judgments (e.g., "one or zero"), it becomes a muscle the whole team can learn. [8]
- Writing evaluation annotation criteria is a learnable muscle: Teams can and should develop the skill of creating clear standards for evaluating AI outputs. [8]
- Don't believe that you need specialized tools to start looking at data: Simple tools can be very effective, and the belief that you need complex tools can be an excuse to avoid looking at data. [4]
- "Just trust your gut" is a recipe for failure: He sarcastically lists this as a way to "maximize failure," emphasizing the need for data-driven decisions. [4]
- The future of RAG lies in better retrieval: This is a key takeaway from his argument against the idea that RAG is obsolete. [6]
Learn more:
- Why Your AI Product Needs Evals with Hamel Husain - Humanloop
- Hamel's Blog
- A Field Guide to Rapidly Improving AI Products -- With Hamel Husain - YouTube
- Common Mistakes People Make with Evals [Hamel Husain] - YouTube
- Mindset Over Metrics: How to Approach AI Engineering | Hamel Husain - YouTube
- Hamel's Substack | Hamel Husain | Substack
- AI Evals For Engineers & PMs by Hamel Husain and Shreya Shankar on Maven
- AI Evals For Engineers: Course Preview (Chapters 1-3 of 8) - YouTube
- Hamel Husain - O'Reilly Media