Lessons from Daphne Koller

Daphne Koller co-founded Coursera and wrote the standard textbook on probabilistic graphical models. She now runs Insitro, using machine learning to parse biological data, map human diseases, and speed up drug discovery.

Part 1: The Calculus of Digital Biology

On predictive frameworks: "Calculus is to physics as AI is to biology. Disciplines become something that we can manage as humans when they become predictable... Calculus was that for Newtonian physics... and we've never had that for biology because biology is complicated and intertwined. Now... we can start to create models of biological systems." — Source: [YouTube]
On the new era: "We're entering a 'new era of science'—we finally have enough data and technology to truly enable better drugs for patients." — Source: [McKinsey]
On digital biology: "AI is merging with biology, and the result, digital biology, will have tremendous repercussions in human health." — Source: [ZDNet]
On unmanageable complexity: "Generative AI in biology [deals] with a level of complexity that is not something that the human brain will really ever be able to understand." — Source: [ZDNet]
On a new compass: "With AI, we're able to use large amounts of data to build 'compasses' that allow us to know, when we get to these forks in the road, which path will most likely lead to success." — Source: [McKinsey]
On human limitations: The field of biology produces data at a volume and dimension that exceeds human intuition, requiring machines to extract the underlying signal from the noise. — Source: [Possible Podcast]
On bits meeting atoms: "In biology, you're in the physical world where bits meet atoms. Data are scarce, biased and causality matters. You have to build the dataset and couple learning tightly to experimentation." — Source: [Observer]
On accelerating testing: "We can do experiments in a matter of weeks instead of years." — Source: [Forbes]
On the data tidal wave: "AI is supercharging a 'tidal wave' of data enabled by a convergence of advancements in biotechnology... It's being fed by the AI tidal wave and... robotics tidal wave." — Source: [Washington Post]
On defining disease: "Our focus is 'de-convoluting' the biology of human disease... Disease is often defined by coarse-grained symptomatic manifestations... we end up with a mishmash that really doesn't speak to the underlying biological causes." — Source: [McKinsey]

Part 2: Fixing Drug Discovery

On failure rates: "Less than one in 10 [drugs] actually ends up getting approved. It's because we truly do not understand the biology... we are basically stabbing in the dark, and then are surprised that our drugs don't work." — Source: [YouTube]
On the primary goal: "While speed is important, failing faster is not helpful; our north star is using A.I. to win more. We believe that this is the path for meaningfully transforming this industry." — Source: [Stanford]
On clinical trial waste: Identifying the right patients for a clinical trial using predictive models prevents effective drugs from failing simply because they were given to the wrong subset of people. — Source: [Lex Fridman Podcast]
On engineering medicine: "We aspire to create a much more engineered process, with a higher success rate." — Source: [McKinsey]
On disease-in-a-dish: Using induced pluripotent stem cells allows researchers to test interventions on human-relevant tissue models rather than relying entirely on animal testing that fails to translate to humans. — Source: [Medium]
On closing the loop: By directly feeding the results of high-throughput laboratory experiments back into machine learning algorithms, the system can continuously design the next, more informative experiment. — Source: [Observer]
On early intervention: The objective is to identify disease drivers well before the downstream symptomatic manifestations appear in patients. — Source: [McKinsey]
On reversing Eroom’s Law: The pharmaceutical industry has historically seen exponentially decreasing efficiency over time; computational tools present the first realistic opportunity to reverse this trend. — Source: [NEJM AI Grand Rounds]
On the gender data gap: Relying on the easy path in historical clinical data collection focused primarily on men to avoid menstrual cycle fluctuations, which resulted in drugs that are significantly less effective for half the population. — Source: [Stanford]

Part 3: The Importance of Data Quality

On data infrastructure: "People are not investing nearly as much as they should in data collection, data aggregation, data quality in order to make sure that the AI has the right input to work on." — Source: [WEF]
On biological data realities: "If you really wanted to interrogate complex biological systems, data quality was much, much more important than in many other applications... AI doesn't get trapped in things that are artifacts, as opposed to signal." — Source: [YouTube]
On haphazard data collection: Creating a massive pile of data by stitching together disparate sources often backfires because inconsistencies in early-stage collection usually emerge years later during critical trials. — Source: [Stanford]
On web-scale limitations: Unlike large language models that can simply scrape the internet, biological research fundamentally lacks a pre-existing, clean, massive dataset of human health. — Source: [Possible Podcast]
On proxy tasks: In data-poor environments, algorithms require high-value proxy tasks to help models learn underlying structural patterns before attempting to predict the ultimate target. — Source: [Stanford]
On generating novel data: Insitro focuses heavily on building internal capacity to generate bespoke, machine-learning-ready datasets rather than relying exclusively on public repositories. — Source: [Medium]
On avoiding artifacts: Machine learning models are exceptionally good at finding shortcuts; poor quality data ensures the model learns experimental artifacts rather than true biological mechanisms. — Source: [YouTube]
On tight coupling: "You have to build the dataset and couple learning tightly to experimentation." — Source: [Observer]
On resolution: The convergence of automation and AI allows scientists to view cellular interactions at a significantly finer resolution than previously possible. — Source: [Washington Post]
On empirical verification: Theoretical models in biology are useless without rapid physical feedback loops to validate computational predictions. — Source: [Observer]

Part 4: Navigating Uncertainty with Probabilistic Models

On the probabilistic framework: "The framework of probabilistic graphical models... provides a general approach for [reasoning]. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms." — Source: [Goodreads]
On modeling reality: "Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality." — Source: [Goodreads]
On the zero-probability trap: "A common mistake is to assign a probability of zero to an event that is extremely unlikely, but not impossible. The problem is that one can never condition away a zero probability, no matter how much evidence we get." — Source: [Stack Exchange]
On solving two problems: Graphical models offer a natural tool for addressing two issues found throughout applied engineering: uncertainty and complexity. — Source: [Coursera]
On interpretable AI: Unlike black-box neural networks, probabilistic models allow humans to inspect the relationships and causal links between variables. — Source: [YouTube]
On structured environments: Graphical models emerged because real-world problems required representations that go beyond simple vector inputs, demanding a framework for richly structured relationships. — Source: [YouTube]
On medical diagnosis applications: Probabilistic frameworks are uniquely suited for integrating noisy evidence from disparate symptoms and tests to determine an underlying condition. — Source: [YouTube]
On early skepticism: In the early 1990s, integrating probability with AI was considered fringe, as mainstream artificial intelligence was strictly focused on rigid logic. — Source: [YouTube]
On continuous learning: "I think a computer that could learn gradually on its own by observing more and more data about the world is more likely to be a path towards achieving some kind of intelligence." — Source: [Daily Maverick]

Part 5: Transforming the Classroom

On outdated models: "Our approach to education has remained largely unchanged since the Renaissance: From middle school through college, most teaching is done by an instructor lecturing to a room full of students, only some of them paying attention." — Source: [QuoteFancy]
On active engagement: "The mind is not a vessel that needs filling, but wood that needs igniting." — Source: [TED]
On the flipped classroom: "Why now that we have all these digital tools can't we take that content, put that online and allow the instructor in that precious classroom time to meaningfully engage with the students?" — Source: [TED]
On data-driven teaching: Student interactions with online platforms transform the study of human learning from a hypothesis-driven field into a strictly data-driven science. — Source: [TED]
On teaching to the medium: You cannot simply record an hour-long lecture and put it online; effective digital education requires broken-down, interactive units suited for the format. — Source: [TED]
On mastery learning: "Mastery is easy to achieve using a computer because a computer doesn't get tired of showing you the same video five times." — Source: [YouTube]
On Benjamin Bloom’s 2-sigma problem: Individual tutoring reliably improves student performance by two standard deviations; technology is the only viable path to scale that level of personalization globally. — Source: [Lex Fridman Podcast]
On hybrid models: "Many will realize that face-to-face instruction was certainly not the optimal or only way of teaching... much of that [hybrid format] will remain even after face-to-face teaching is more possible." — Source: [Education & Career News]
On removing the lecture burden: Universities should stop using their most precious resource—faculty time—to repeatedly deliver identical lectures, and instead use it for direct student mentorship. — Source: [YouTube]

Part 6: Democratizing Opportunity & Lifelong Learning

On the right to learn: "I would like to make it so that education was a right, and not a privilege." — Source: [QuoteFancy]
On hidden potential: "Amazing talent can be anywhere. Maybe the next Albert Einstein or Steve Jobs is living in a remote village in Africa." — Source: [Coursera]
On opening doors: "For the millions here and abroad who lack access to good, in-person education, online learning can open doors that would otherwise remain closed." — Source: [AZQuotes]
On continuous skills: "It's a shame that for so many people learning stops when we finish high school or when we finish college... we would be able to learn something new every time we wanted." — Source: [YouTube]
On degree obsolescence: "We have passed the stage in history where what you learn in college can last you for a lifetime. After 15 years, that learning is obsolete." — Source: [Knowledge Star]
On bridging the skills gap: "Even though there is rampant unemployment in many parts of the world, there are still large numbers of jobs that are going unfilled because employers are having a hard time identifying people with the right set of skills." — Source: [QuoteFancy]
On motivation as a credential: "If a student takes a Stanford computer class and a Princeton business class, it shows they are motivated and have skills. We know it has helped employees get better jobs." — Source: [QuoteFancy]
On universal talent: "Talent is universal, but access to resources is not." — Source: [Medium]
On upskilling the educated: "High-quality education provided by MOOCs can be a significant factor in opening doors to opportunity – even among the college-educated." — Source: [QuoteFancy]
On Silicon Valley ethos: "Current ethos in Silicon Valley is that if you build a website that people keep coming back to and is changing the lives of millions, you can eventually make money." — Source: [QuoteFancy]

Part 7: Building a "Bilingual" Culture

On bridging disciplines: "The groups that we're talking about have different ways of thinking, different jargon and different approaches to doing science. Getting them to work together as an integrated team is really our secret sauce." — Source: [Forbes]
On speaking both languages: "I felt like and even today feel like one of the biggest limitations is that there's not that many people who speak both languages... I felt like that was an opportunity to make a really big impact." — Source: [YouTube]
On early collaboration: "And those teams get together not at the time that you know the data is there and is ready to be analyzed... but rather at the time that we decide what problem to work on." — Source: [YouTube]
On asking better questions: When computer scientists and biologists collaborate at the very genesis of a project, they do not just find better answers—they formulate fundamentally better scientific questions. — Source: [YouTube]
On hiring for humility: A bilingual culture requires recruiting individuals who possess the humility and curiosity to learn the terminology and mechanics of an entirely different field. — Source: [a16z]
On structural parity: "How do you make sure that you don't create second-class citizens and that your employees truly see one another as equals? ... We've laid out behavioral norms from the very beginning." — Source: [WEF]
On the 50/50 split: Maintaining a roughly even ratio of life scientists to computational engineers is structurally vital to ensure all projects remain inherently cross-functional. — Source: [YouTube]
On avoiding hand-offs: The traditional industry model where biologists generate data and simply throw it over the wall to data scientists reliably leads to misaligned goals and shoddy results. — Source: [a16z]
On organizational synthesis: True innovation demands the ability to transfer methodologies from one domain to another, creating a synthesized workflow that did not previously exist. — Source: [YouTube]

Part 8: Leadership & Navigating the Unknown

On culture over technology: "Technology leaders... often think that it's all about the technology that you're building... but what persists are your people and the culture that attracts those people." — Source: [YouTube]
On defining high performance: A successful corporate culture is defined by collaboration, continuous innovation, collective boldness, and the ability to execute quickly. — Source: [YouTube]
On dealing with chaos: "The world is noisy and messy. You need to deal with the noise and uncertainty." — Source: [Future]
On persistence: "Start marching forward, even when into the new and unknown, and don't be deterred by obstacles." — Source: [QuoteFancy]
On human-machine collaboration: "The future of most human endeavors is a partnership between a human and a machine... Human creativity and human innovation is still something that is an important partner." — Source: [YouTube]
On testing AI personally: Leaders must play with emerging technologies firsthand; without visceral understanding of the tools, it is impossible to drive necessary organizational change. — Source: [YouTube]
On picking partners: "Probably the most important advice to any woman [or leader] interested in a career is to pick your life partner with care. Having a supportive partner is key." — Source: [AZQuotes]
On leaving academia: "I had been experiencing an increased sense of urgency to make a difference in the world more directly, rather than by proxy via students or writing papers." — Source: [Stanford]
On the academic scaling trap: Operating a startup with the free-form, process-light style of a university lab works for the first few dozen employees, but becomes a severe liability as the company scales. — Source: [The Minor Consult]

Lessons from Daphne Koller

Lessons from Daphne Koller

Part 1: The Calculus of Digital Biology

Part 2: Fixing Drug Discovery

Part 3: The Importance of Data Quality

Part 4: Navigating Uncertainty with Probabilistic Models

Part 5: Transforming the Classroom

Part 6: Democratizing Opportunity & Lifelong Learning

Part 7: Building a "Bilingual" Culture

Part 8: Leadership & Navigating the Unknown

Get the next notes and essays.

More profiles

Lessons from Alex Sacerdote

Lessons from Paul Desmarais Jr.

Lessons from Michele Romanow

Explore the surrounding system