Lessons from Koray Kavukcuoglu

As Chief AI Architect at Google and CTO of Google DeepMind, Koray Kavukcuoglu drove the lab's shift from fundamental research to large-scale product engineering. Though he built his reputation on early work like Deep Q-Networks and WaveNet, his focus today is the architecture required for artificial general intelligence. This profile covers his perspective on world models, team dynamics, and what comes next in machine learning.

Part 1: The Pursuit of AGI

On defining AGI: "We do not have the recipe of how to build AGI, because it is still fundamentally a research problem." — Source: [India Times Interview]
On the nature of progress: "Approaching general intelligence requires accepting that there is no static roadmap, demanding continuous iteration rather than waiting for a single breakthrough." — Source: [Financial Times Interview]
On generalization: "True capability is measured by a system's ability to seamlessly adapt to environments and tasks it was never explicitly trained to handle." — Source: [DeepMind: The Podcast]
On milestones: "Achieving human-level performance in isolated games was an early proof of concept, but general intelligence demands multi-domain mastery." — Source: [Sparks! at CERN]
On exploratory research: "Building advanced systems is an inherently exploratory process where researchers must navigate unknowns without a guaranteed path to success." — Source: [Royal Academy of Engineering]
On balancing timelines: "The distance to AGI cannot be measured purely in computing cycles; it is gated by our ability to understand how to correctly structure learning." — Source: [Logan Kilpatrick Interview]
On the end goal: "The objective is to build systems that can independently solve a broad spectrum of real-world problems." — Source: [DeepMind: The Podcast]
On unforeseen challenges: "Every order of magnitude increase in scale reveals new behavioral properties that were entirely absent in smaller systems." — Source: [Financial Times Interview]
On algorithmic generality: "We strive to find single, unified algorithms that can perform well across many different tasks without specialized tuning." — Source: [ICLR Invited Talk]
On maintaining focus: "It is easy to get distracted by narrow successes; the harder task is keeping the research organization aligned toward the long-term goal of general capability." — Source: [Google DeepMind Blog]

Part 2: The Shift to an Engineering Mindset

On scaling research: "Building AGI has moved beyond pure academic endeavor; it requires a strict engineering mindset focused on product integration and scalability." — Source: [Logan Kilpatrick Interview]
On research purity vs. utility: "A lab must eventually transition from purely seeking breakthroughs to architecting systems that deliver real-world utility." — Source: [Financial Times Interview]
On organizational design: "Failures in deploying advanced models are frequently people problems or integration bottlenecks, rather than algorithmic limitations." — Source: [Google DeepMind Blog]
On bridging disciplines: "Frontier research can only scale when researchers, software engineers, and product managers work in tight connection." — Source: [Logan Kilpatrick Interview]
On deployment: "The true test of an architecture is how well it serves users and solves complex problems once integrated into a live product." — Source: [CyberNews Interview]
On software infrastructure: "The software stack required to train massive models is as necessary to the final outcome as the mathematical formulations of the algorithms." — Source: [DeepMind: The Podcast]
On continuous iteration: "There is no final version of a model; systems must be continually updated in response to how they interact with live environments." — Source: [Financial Times Interview]
On failure modes: "Anticipating how a system breaks in a production setting is a distinct discipline from measuring its loss curve during training." — Source: [Royal Academy of Engineering]
On the lab-to-product pipeline: "Ensuring that lab breakthroughs translate into scalable, user-friendly applications requires intentional organizational scaffolding." — Source: [Financial Times Interview]
On resource allocation: "Moving from research to engineering means making hard choices about which experimental pathways justify the massive compute required to scale them." — Source: [Logan Kilpatrick Interview]

Part 3: Gemini and Architectural Goals

On the core objective: "At the end of the day, Gemini is not the architecture, Gemini is the goal that you want to achieve." — Source: [Logan Kilpatrick Interview]
On native multimodality: "Designing models to understand text, audio, and visual data from the ground up prevents the friction of stitching separate models together." — Source: [Google DeepMind Blog]
On architectural evolution: "The specific layers and mechanisms of a model will change, but the ambition to create a universally capable system remains constant." — Source: [Logan Kilpatrick Interview]
On reasoning across modalities: "An advanced model must be able to observe a visual phenomenon and reason about its implications in text natively." — Source: [DeepMind: The Podcast]
On evaluating Gemini: "Success for Gemini is measured by its ability to perform high-level reasoning tasks that previously required human intuition." — Source: [Financial Times Interview]
On model families: "Building different sizes of the same model family ensures that capability can be deployed everywhere from mobile devices to massive data centers." — Source: [Google DeepMind Blog]
On the illusion of architecture: "Researchers sometimes obsess over the specifics of a transformer block when the larger goal is the emergent capability it enables." — Source: [Logan Kilpatrick Interview]
On integrating search and generation: "A model becomes vastly more useful when its internal representations are linked to external, verifiable information retrieval." — Source: [DeepMind: The Podcast]
On long contexts: "Expanding the context window allows models to synthesize information across entire books or hours of video, changing the nature of how we query information." — Source: [Google DeepMind Blog]
On future iterations: "The models we train today are stepping stones; they help us map the territory of what is required for the next generation of architectures." — Source: [Financial Times Interview]

Part 4: World Models and Physical Logic

On moving beyond generation: "True machine intelligence requires systems to do more than generate text or video; it requires grounded world models." — Source: [Logan Kilpatrick Interview]
On physical laws: "A model must eventually learn to understand the underlying physical laws and logic of the environments it simulates." — Source: [Sparks! at CERN]
On simulation as understanding: "If a network can accurately predict the next state of a complex environment, it has developed a functional understanding of its rules." — Source: [ICLR Invited Talk]
On logical coherence: "Language models often hallucinate because they lack a grounded model of how the physical world operates." — Source: [DeepMind: The Podcast]
On spatial reasoning: "Achieving true capability means a system can infer depth, object permanence, and physical constraints purely from its training distribution." — Source: [Sparks! at CERN]
On scientific application: "World models are necessary for making advancements in engineering, finance, and material discovery." — Source: [Logan Kilpatrick Interview]
On training environments: "Rich, interactive simulations provide the grounding necessary for agents to build internal representations of cause and effect." — Source: [ICLR Invited Talk]
On common sense: "What humans call common sense is effectively a highly compressed world model that allows us to rapidly discard physically impossible scenarios." — Source: [DeepMind: The Podcast]
On the limitations of text: "Text is a low-bandwidth projection of reality; models must learn from higher-bandwidth modalities to truly understand the world." — Source: [Sparks! at CERN]

Part 5: Data, Scaling, and Capability

On the scaling debate: "While scale is undoubtedly important, the data fed into models and the specific architectural designs that leverage that scale are equally necessary." — Source: [CyberNews Interview]
On data quality: "Increasing the parameter count only yields returns if the quality and diversity of the training data scale proportionally." — Source: [Financial Times Interview]
On synthetic data: "As we exhaust human-generated text, we must develop reliable methods for models to learn from synthetic data without collapsing into repetitive loops." — Source: [Google DeepMind Blog]
On computational efficiency: "The goal is to extract the maximum amount of capability from every computing cycle used during training." — Source: [Logan Kilpatrick Interview]
On representation learning: "The effectiveness of a scaled model is rooted in its ability to build increasingly abstract representations of the data it consumes." — Source: [ICLR Invited Talk]
On algorithmic improvements: "Simply adding more GPUs is a brute-force approach; sustainable progress requires fundamental improvements to the learning algorithms themselves." — Source: [CyberNews Interview]
On data diversity: "A model trained on a narrow distribution will be brittle; resilience comes from forcing the network to reconcile highly varied data sources." — Source: [DeepMind: The Podcast]
On the limits of memorization: "True scaling success is seen when a model stops memorizing the training set and starts deducing the underlying generative rules." — Source: [ICLR Invited Talk]
On curating datasets: "The engineering effort required to clean and curate training data is often the most underappreciated aspect of building frontier models." — Source: [Financial Times Interview]

Part 6: Reinforcement Learning and Generative Models

On deep reinforcement learning: "Combining deep neural networks with reinforcement learning allowed agents to derive control policies directly from raw sensory input." — Source: [ICLR Invited Talk]
On the legacy of DQN: "The Deep Q-Network proved that a single architecture could learn to play a wide variety of Atari games without domain-specific adjustments." — Source: [DeepMind: The Podcast]
On delayed rewards: "The hardest challenge in reinforcement learning is designing systems that can correctly attribute a successful outcome to an action taken thousands of steps earlier." — Source: [ICLR Invited Talk]
On generative speech: "WaveNet demonstrated that autoregressive generative models could synthesize audio waveform by waveform, achieving unprecedented naturalness." — Source: [ICLR Invited Talk]
On exploration vs. exploitation: "An agent must constantly balance utilizing its current knowledge to gain rewards and exploring unknown states to find better strategies." — Source: [DeepMind: The Podcast]
On generative agents: "Transitioning from models that passively generate data to agents that actively interact with their environment is a core step toward general intelligence." — Source: [ICLR Invited Talk]
On offline reinforcement learning: "Learning from static datasets of past interactions allows us to train agents without the computational cost of continuous live simulation." — Source: [Sparks! at CERN]
On self-play: "When agents compete against versions of themselves, they generate a constantly escalating curriculum that drives continuous improvement." — Source: [Google DeepMind Blog]
On bridging RL and LLMs: "Applying reinforcement learning techniques to large language models is what aligns raw predictive power with helpful, goal-oriented behavior." — Source: [DeepMind: The Podcast]

Part 7: Speed, Evaluation, and the Pareto Frontier

On the speed requirement: "Raw capability is not the only metric for success; achieving high performance quickly and efficiently is equally important." — Source: [Logan Kilpatrick Interview]
On the Pareto frontier: "Engineering models involves navigating a tradeoff between the depth of the model's reasoning and the latency of its response." — Source: [Logan Kilpatrick Interview]
On rigorous evaluation: "We must bake safety and reliability into the development process through relentless, adversarial evaluation of our systems." — Source: [Royal Academy of Engineering]
On benchmarks: "Static benchmarks quickly become saturated; we need dynamic evaluation frameworks that evolve as the models become more capable." — Source: [Financial Times Interview]
On safety frameworks: "Developers must focus on creating structures that minimize misuse and unintended consequences before a model is widely deployed." — Source: [Royal Academy of Engineering]
On collaborative safety: "Evaluating the risks of generative AI requires industry-wide forums where researchers can share failure modes and safety techniques." — Source: [Royal Academy of Engineering]
On inference optimization: "Making a model fast during inference is what transforms a research artifact into a daily utility for users." — Source: [Google DeepMind Blog]
On balancing priorities: "The challenge is to maintain the safety constraints of a model without degrading its usefulness or slowing down its response time." — Source: [CyberNews Interview]
On evaluating reasoning: "It is difficult to measure a model's true reasoning ability when we cannot fully untangle what it has simply memorized from its training data." — Source: [DeepMind: The Podcast]

Part 8: Leadership, Culture, and Collaboration

On building teams: "Everyone is coming together and working together and supporting each other. What makes you tackle really hard problems is having the right team together." — Source: [Logan Kilpatrick Interview]
On global collaboration: "Frontier models are not built in a vacuum; their development requires massive, global collaboration across distinct scientific disciplines." — Source: [Logan Kilpatrick Interview]
On maintaining morale: "The work is inherently difficult, but the enjoyment of collaborating with exceptional peers is what sustains momentum through long research cycles." — Source: [DeepMind: The Podcast]
On AI as a scientific amplifier: "These technologies serve as an amplifier for human intellect, allowing researchers to focus on higher-level conceptual depth." — Source: [Google DeepMind Blog]
On research culture: "A successful lab must tolerate a high failure rate in its experiments while maintaining strict discipline in its engineering practices." — Source: [Financial Times Interview]
On leadership: "Guiding an AI organization requires balancing the natural curiosity of researchers with the strategic directives of product delivery." — Source: [Google DeepMind Blog]
On cross-pollination: "The best ideas often emerge when experts in reinforcement learning sit down with experts in natural language processing and systems engineering." — Source: [DeepMind: The Podcast]
On scaling operations: "As the scope of the models grows, the complexity of managing the human organization building them grows at an equal rate." — Source: [Financial Times Interview]
On long-term vision: "Keeping a team motivated over a decade requires a shared belief that the technology being built will fundamentally advance human scientific discovery." — Source: [Sparks! at CERN]

Lessons from Koray Kavukcuoglu

Lessons from Koray Kavukcuoglu

Part 1: The Pursuit of AGI

Part 2: The Shift to an Engineering Mindset

Part 3: Gemini and Architectural Goals

Part 4: World Models and Physical Logic

Part 5: Data, Scaling, and Capability

Part 6: Reinforcement Learning and Generative Models

Part 7: Speed, Evaluation, and the Pareto Frontier

Part 8: Leadership, Culture, and Collaboration

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Darren Farber

Lessons from Vlad Barbalat

Lessons from Kareem Amin