Lessons from Douwe Kiela

Douwe Kiela is an AI researcher who co-invented Retrieval-Augmented Generation (RAG) at Meta's FAIR lab. As CEO and co-founder of Contextual AI, he now builds enterprise language models that securely process external data. This profile covers his approach to retrieval, the flaws in static AI benchmarks, and what agentic systems actually need to work in the real world.

Part 1: The Origins and Evolution of RAG

On inventing RAG: "We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea." — Source: [NextBrick]
On the core premise of retrieval: "The fundamental idea of RAG is that language models should not need to memorize facts in their weights, but rather retrieve them dynamically from an external knowledge base." — Source: [Contextual AI]
On the mainstream adoption of RAG: "It's been interesting to see RAG become a completely mainstream idea. We don't have to explain it anymore." — Source: [The Data Exchange]
On early skepticism: "In 2020, the idea of bolting a search engine onto a language model seemed somewhat counterintuitive to those focused entirely on scaling up model parameters." — Source: [Towards Data Science]
On naming conventions: "If the original FAIR research team had known how widespread the technique would become, they would have spent significantly more time workshopping the acronym." — Source: [NextBrick]
On parametric versus non-parametric memory: "RAG bridged a major gap by combining the parametric memory of a pre-trained sequence-to-sequence model with the non-parametric memory of a dense vector index of Wikipedia." — Source: [arXiv:2005.11401]
On the shift in model design: "The introduction of RAG marked a departure from treating language models as closed-book knowledge stores, shifting them toward an open-book exam format." — Source: [Meta AI Research]
On early implementation: "The initial implementations of RAG required careful joint training of the retriever and the generator to ensure the retrieved documents actually improved the final output." — Source: [arXiv:2005.11401]
On knowledge update mechanisms: "One of the original motivations for RAG was the ability to update the model's knowledge simply by swapping out documents in the index, without needing to retrain the neural network." — Source: [Meta AI Research]
On the foundational paper: "The 2020 paper 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks' established the baseline architecture that would eventually dominate enterprise generative AI." — Source: [arXiv:2005.11401]

Part 2: Contextual AI and the Enterprise Data Problem

On enterprise constraints: "But getting this to actually work at scale on real world data where you have enterprise constraints is a very different problem." — Source: [Towards Data Science]
On bringing models to data: "We bring the model to the data to ensure privacy. Auditability is also a critical feature." — Source: [Microsoft Start]
On the inadequacy of general models: "General-purpose models often fail in corporate environments because they lack the specific, specialized context that lives inside private company databases." — Source: [SaaStr]
On context engineering: "The future of working with AI in business is shifting away from prompt engineering and moving heavily toward context engineering, ensuring the model receives the right information at the right time." — Source: [The Data Exchange]
On data privacy: "Moving enterprise data into a third-party black-box model creates severe privacy and security risks for highly regulated industries." — Source: [Contextual AI]
On clear attribution: "With properly integrated enterprise models, users can trace back exactly where the data came from, ensuring clear attribution for every generated claim." — Source: [Microsoft Start]
On deploying AI: "When you deploy AI, you need to know if it works or not, and you need to know where it falls short, and you need to have trust in your deployment." — Source: [Latent Space]
On the reality of corporate data: "Real enterprise data is often messy, unstructured, and fragmented across multiple internal systems, requiring advanced retrieval mechanisms to make it useful for an AI model." — Source: [SaaStr]
On the goal of Contextual AI: "The mission of the company is to build AI that inherently understands a company's unique context rather than relying on generic internet-trained knowledge." — Source: [Contextual AI]
On specialized applications: "The most valuable AI deployments will not be generic chatbots, but highly specialized systems tuned specifically to a company's internal workflows." — Source: [SaaStr]

Part 3: RAG 2.0: Moving Beyond Stitched-Together Systems

On current RAG limitations: "A typical RAG system today uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework." — Source: [B Capital]
On end-to-end optimization: "The defining characteristic of RAG 2.0 is the shift from pieced-together components to an end-to-end trained system where the retriever and generator learn together." — Source: [Contextual AI Blog]
On system agency: "These systems represent what Contextual AI calls RAG 2.0, which are systems that have agency in deciding when and what to retrieve." — Source: [Latent Space]
On performance gains: "Jointly optimizing the retrieval and generation phases typically yields higher accuracy and fewer hallucinations than passing data blindly from a vector database to an LLM." — Source: [Contextual AI Blog]
On orchestration frameworks: "Relying entirely on external orchestration tools like LangChain to patch together incompatible models often results in suboptimal and brittle enterprise systems." — Source: [B Capital]
On custom embeddings: "Pre-trained embeddings struggle with domain-specific enterprise terminology, making custom-trained embedding models a requirement for advanced retrieval." — Source: [Towards Data Science]
On the evolution of retrieval: "Future models will dynamically adjust their retrieval strategy based on the specific query, rather than executing a fixed top-K search for every prompt." — Source: [Latent Space]
On integrating components: "When the language model is trained to understand its specific retriever's output format, the system wastes fewer tokens trying to parse irrelevant context." — Source: [Contextual AI Blog]
On the RAG 2.0 paradigm: "Moving to RAG 2.0 means treating the entire retrieval and generation pipeline as a single differentiable machine learning problem." — Source: [Contextual AI Blog]

Part 4: The Flaws in AI Benchmarks and Evaluation

On static benchmarks: "Reliance on faulty benchmarks stunts AI growth. You end up with a system that is better at the test than humans are but not better at the overall task." — Source: [Medium]
On deceiving progress metrics: "It's very deceiving because it makes it look like we're much further than we actually are." — Source: [Medium]
On Dynabench's purpose: "Dynabench was created to move the AI community away from static test sets toward a dynamic, adversarial evaluation process that continually challenges models." — Source: [Meta AI Research]
On human-in-the-loop testing: "The most effective way to expose model flaws is to have human annotators actively try to find edge cases that the model fails to process correctly." — Source: [Meta AI Research]
On Goodhart's Law in AI: "When a specific benchmark becomes the target for all AI research, optimizing for that test ceases to be a reliable indicator of true intelligence or capability." — Source: [Facebook Engineering]
On the illusion of super-human performance: "Models frequently beat human baselines on popular academic datasets, yet fail entirely in messy, real-world deployment scenarios." — Source: [Facebook Engineering]
On continuous evaluation: "AI evaluation should be treated as a continuous, evolving process rather than a static leaderboard that gets solved and abandoned." — Source: [Meta AI Research]
On adversarial data collection: "Gathering data specifically from the mistakes models make allows researchers to train the next generation of models on the exact areas where current systems are weak." — Source: [Facebook Engineering]
On real-world readiness: "A high score on a standard language benchmark reveals very little about how a model will perform when integrated into an enterprise software stack." — Source: [Latent Space]

Part 5: Hallucinations, Trust, and Auditability in AI

On grounding AI output: "Grounding a model's response in retrieved documents is the most reliable method for reducing hallucinations in knowledge-intensive tasks." — Source: [ai.engineer]
On verifiable answers: "For an AI system to be useful in a professional setting, every factual claim it generates must include a specific, auditable citation to the source material." — Source: [Microsoft Start]
On the root cause of hallucinations: "Hallucinations often occur because models are forced to guess information they encountered briefly during pre-training rather than looking up the precise facts." — Source: [Towards Data Science]
On building trust: "With our model, because it's more integrated, you can trace back exactly where the data came from." — Source: [Microsoft Start]
On the cost of errors: "In consumer applications, a hallucinated fact is an amusement; in enterprise deployments, it is a liability that can break user trust and cause material damage." — Source: [SaaStr]
On managing uncertainty: "A reliable AI system should be capable of determining when it does not have enough retrieved context to answer a question and should clearly state its ignorance." — Source: [Contextual AI]
On auditing model behavior: "True auditability requires a system architecture where the exact path from user query to retrieved document to generated response is transparent to the operator." — Source: [Contextual AI Blog]
On the limitations of fine-tuning: "Fine-tuning a model on specific facts is an inefficient and unreliable way to prevent hallucinations compared to simply retrieving the correct facts at runtime." — Source: [The Data Exchange]
On the necessity of guardrails: "System-level guardrails must be implemented alongside retrieval to verify that the generated output accurately reflects the retrieved source text." — Source: [Latent Space]

Part 6: The "RAG is Dead" Misconception

On recurring narratives: "Every few months, the AI world experiences a similar pattern. A new model drops with a larger context window, and social media lights up with declarations that RAG is dead." — Source: [Contextual AI Blog]
On the core misunderstanding: "But these pronouncements misunderstand RAG's purpose and why it will always have a role in AI." — Source: [Contextual AI Blog]
On the persistence of retrieval: "Regardless of what you call it, retrieval is at the heart of generative AI." — Source: AI Engineer
On long context windows: "A one-million token context window does not replace the need for RAG; it simply allows the system to process larger amounts of retrieved information at once." — Source: [Towards Data Science]
On the cost factor: "Passing a massive enterprise database into an LLM context window for every query is computationally wasteful and economically unviable compared to targeted retrieval." — Source: [Contextual AI Blog]
On latency constraints: "As context windows grow, the latency and time-to-first-token increase significantly, making direct retrieval a faster and more practical solution for real-time applications." — Source: [SaaStr]
On dynamic knowledge updates: "Long context windows do not solve the problem of needing to update information in real-time; retrieval remains the most efficient way to access live, changing data." — Source: AI Engineer
On information overload: "Giving a model too much irrelevant context in a massive prompt can actually degrade its performance and lead to lost-in-the-middle phenomena." — Source: [Contextual AI Blog]
On the symbiosis of technologies: "Long context models and RAG are complementary technologies; advanced RAG systems will use long context capabilities to reason over more comprehensive search results." — Source: [Towards Data Science]

Part 7: Scaling AI for the Real World

On the scaling trap: "I think the trouble that people get into with it is scaling it up. It's great on 100 documents, but now all of a sudden I have to go to 100,000 or 1,000,000 documents." — Source: [Towards Data Science]
On vector database limits: "Simple cosine similarity searches in vector databases break down when attempting to retrieve highly nuanced information across massive corporate archives." — Source: [B Capital]
On production environments: "Moving a RAG prototype from a local Jupyter notebook to a high-availability production cluster exposes severe bottlenecks in retrieval latency and ranking accuracy." — Source: [Towards Data Science]
On hybrid retrieval: "Scaling successfully requires moving beyond pure dense vector search and integrating hybrid approaches that combine dense embeddings with keyword-based exact matching algorithms like BM25." — Source: [Contextual AI Blog]
On infrastructure costs: "As the scale of documents increases, optimizing the embedding models and retrieval indexing becomes essential to keeping cloud infrastructure costs manageable." — Source: [SaaStr]
On data chunking strategies: "At scale, simple fixed-size text chunking methods fail; deploying effective retrieval requires intelligent parsing that respects document structure and semantics." — Source: [The Data Exchange]
On cross-encoder re-ranking: "To achieve high accuracy on large datasets, a lightweight initial retrieval step must often be followed by a computationally heavier cross-encoder to re-rank the results." — Source: [B Capital]
On access control: "Real-world enterprise search requires integrating complex permission architectures so that models only retrieve and generate answers based on documents the specific user is authorized to view." — Source: [Contextual AI]
On maintaining freshness: "Scaling an AI application means building pipelines that can continually ingest and index new documents in real-time without disrupting user queries." — Source: [Towards Data Science]

Part 8: The Future of AI Architectures and Open Source

On open science: "The broader AI community benefits significantly when researchers publish not just their model weights, but their training data, evaluation metrics, and failure cases." — Source: [Meta AI Research]
On multi-modal AI: "The concepts underlying RAG are not limited to text; future architectures will increasingly incorporate multimodal retrieval, allowing models to cite and generate images, audio, and video contextually." — Source: [The Data Exchange]
On collaborative platforms: "Platforms like Hugging Face have accelerated AI progress by providing standardized tools and a central hub for researchers to share and iterate upon each other's models." — Source: [Business Insider]
On small, specialized models: "The future of AI in production will likely rely heavily on smaller, highly specialized models optimized for specific tasks, rather than depending exclusively on massive, general-purpose LLMs." — Source: [Latent Space]
On algorithmic efficiency: "Improving the efficiency of retrieval and generation architectures is just as important as scaling up compute, particularly for making AI accessible to smaller organizations." — Source: [Towards Data Science]
On bridging academia and industry: "The transition from research labs to enterprise startups is essential for taking theoretical AI advancements and hardening them into stable, usable software." — Source: [The Data Exchange]
On the democratization of AI: "Open-source ecosystems prevent the concentration of AI capabilities in the hands of a few tech giants, fostering innovation across a wider array of industries." — Source: [Business Insider]
On evaluation ecosystems: "Open platforms for model evaluation will be critical to maintaining an objective standard of progress as the field grows increasingly commercialized." — Source: [Meta AI Research]
On the long-term vision: "The ultimate goal of integrating retrieval and generation is to create systems that act as reliable, context-aware agents capable of executing complex workflows, rather than serving merely as passive chat interfaces." — Source: [Contextual AI Blog]
On bridging research and enterprise: "Ultimately, bridging the gap between cutting-edge research and practical enterprise deployment is the most important frontier for the next decade of artificial intelligence." — Source: [SaaStr]

Lessons from Douwe Kiela

Lessons from Douwe Kiela

Part 1: The Origins and Evolution of RAG

Part 2: Contextual AI and the Enterprise Data Problem

Part 3: RAG 2.0: Moving Beyond Stitched-Together Systems

Part 4: The Flaws in AI Benchmarks and Evaluation

Part 5: Hallucinations, Trust, and Auditability in AI

Part 6: The "RAG is Dead" Misconception

Part 7: Scaling AI for the Real World

Part 8: The Future of AI Architectures and Open Source

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Neil Strauss

Lessons from Sanjay Gupta

Lessons from Richard Feynman