Lessons from Aidan Gomez

Aidan Gomez co-authored the 2017 "Attention Is All You Need" paper, which introduced the Transformer architecture that powers modern AI. He left Google Brain to found Cohere, focusing on bringing highly efficient, private language models to enterprise businesses. This profile collects his views on the limits of current AI architectures, the reality of building a startup, and why practical utility matters more than the pursuit of artificial general intelligence.

Part 1: The Transformer and Its Origins

On the core breakthrough: "The Transformer was about making models that could understand context—not just individual words, but how they relate to each other." — Source: [eBoona]
On the lucky mistake: "I got in through an administrative mistake. I was a third-year undergrad... They said, 'We don't hire undergrad students,' but I was already there." — Source: [Clay]
On organic team formation: "Google Brain allowed people to just like organically form groups and teams. We realized folks were kind of working on the same thing... so then we just decided to team up and join forces." — Source: [Simplecast]
On academic freedom: "A group of researchers, given total academic freedom, accidentally stumbled into one of the most important breakthroughs in AI history." — Source: [YouTube]
On infinite resources: "It was a great setup at Google Brain where I had amazing collaborators and all of the compute that I could want." — Source: [Aidan Gomez]
On naming the architecture: "I thought [Attention] was too simple... I had a lot of names. There was something called 'Cargonet.' I wrote something where one layer was convolution, one was attention, one I called 'recognition'." — Source: [YouTube]
On the sensitivity of data: "The importance of data—I underrated it dramatically. I thought it was just scale... It is a bit surreal how sensitive the models are to their data." — Source: [Bookscrolling]
On leaving the lab: "I’m seeing the golden era of these big industrial labs doing pure research coming to an end. We are entering a new era of startups." — Source: [The Logic]
On the frustration of stagnancy: "There were all these insane demos and super-exciting progress and then nothing—just stagnancy... People aren’t pushing this forward. We need to build something that lets more people use this." — Source: [The Logic]
On understanding intelligence: "I've always been interested in that idea of 'if we can build it, we understand it.' If we can build artificial intelligence, we understand how intelligence emerges." — Source: [McKinsey]

Part 2: The Future of AI Architecture

On the need for successors: "The world needs something better than Transformers... I think all of us here hope it gets succeeded by something that will carry us to a new plateau of performance. It would be really sad if it is the best we can do." — Source: [YouTube]
On the longevity of Transformers: "It’s really surprising to me that the transformers we train today look so similar to what was back then [in 2017]. It kind of disturbs me how similar to the original form we are." — Source: [YouTube]
On experiential learning: "The only way to make these models smart... you couldn't make these models smarter in a vacuum of a lab. You actually have to go out and put them in people's hands because you kind of need the world to annotate." — Source: [YouTube]
On the limit of scale: "It’s definitely true that if you throw more compute at the model, if you make the model bigger, it’ll get better. It’s the most trustworthy way to improve models. It’s also the dumbest." — Source: [Marketing AI Institute]
On the flattening curve: The industry is witnessing a flattening of the scaling curve for model improvements, marking a transition from a capex training model to a consumption-based inference model. — Source: [No Priors]
On synthetic data: Synthetic data is the crucial element for the next generation of models, shifting reliance away from the public internet toward high-quality, model-generated reasoning chains. — Source: [Bookscrolling]
On reasoning vs. scale: "We’ve just scratched the surface... all the interesting problems require reasoning, and there’s a lot of white space for us to go after." — Source: [Reddit]
On the shift in AI focus: The idea that AI models are plateauing is incorrect; rather, the field is on the verge of a significant change in capabilities driven by the introduction of reasoning and planning. — Source: [Reddit]
On pushing beyond text: The next necessary leap in AI architecture involves training models on world-based tasks rather than relying purely on text prediction to simulate understanding. — Source: [Anchor]

Part 3: Enterprise AI and Business Strategy

On AI 2.0: "The next phase of development will move beyond generic LLMs towards tuned and highly optimized end-to-end solutions that address the specific objectives of a business. AI 2.0 will accelerate adoption, value creation, and will help fundamentally transform how businesses operate." — Source: [Business Insider]
On isolation vs. integration: "It’s very rare that in isolation an LLM is actually useful. It can be fun to chat to a model... but overwhelmingly, the real value we’ll find from LLMs is in plugging them into broader systems." — Source: [YouTube]
On the future of companies: Every company will eventually be forced to become an AI company, integrating intelligence into their core operations to remain competitive. — Source: [Exponential View]
On custom models: The most successful enterprise applications in the future will be built by companies that take the time to build and train their own specialized models. — Source: [Redpoint]
On regulated industries: "Healthcare represents one of the most consequential opportunities for AI and it demands secure, sovereign, and domain-specific systems." — Source: [Intuition Labs]
On the value of proprietary data: Businesses are sitting on troves of proprietary data that public models have never seen, and tapping into that data is the only way to gain a unique competitive advantage. — Source: [YouTube]
On the pyramid of automation: Enterprise AI functions like a pyramid: at the bottom are generalist copilots, and at the top are highly specialized agents that use proprietary tools to execute complex workflows. — Source: [YouTube]
On human productivity: "What inspires me much more is increasing human productivity... letting humans do more and accomplish more." — Source: [Bookscrolling]
On shifting focus from hype: "We just want to quietly do good work, build really good software, and focus on becoming an essential partner to our customers." — Source: [Upstarts Media]
On solving real problems: "I want doctors to spend less time writing up notes and filling out paperwork... I want to help them solve those problems, give them agents that can help them do research on something they've never seen." — Source: [YouTube]

Part 4: Efficiency and Hardware Pragmatism

On production economics: "Some of the other models that we compete against are huge and super inefficient. You can't actually put that into production, because as soon as you're faced with real users, costs blow up and the economics break." — Source: [AP News]
On hardware constraints: "We set a constraint saying within two GPUs, we are going to squeeze the most intelligence out of those two GPUs that we can. But that's the cap." — Source: [YouTube]
On infrastructure spending: "Spending billions of dollars a year isn't necessary to produce top-tier tech that's competitive. I think spending more and more on infrastructure for training AI models, rather than 'inference,' is a mistake." — Source: [Business Insider]
On the Transformer's original edge: "What is the core unique insight in that paper that propelled so much of this forward? Efficiency. It was extremely well suited to scaling it up across many GPUs." — Source: [YouTube]
On avoiding brute force: Relying purely on scale and brute force to improve model performance is an inefficient path; there are far more elegant ways to achieve intelligence. — Source: [Marketing AI Institute]
On capital intensity: "The fact that [DeepSeek] published their training efficiency numbers let people see that it doesn't need to be so capital-intensive to publish fantastic models." — Source: [Business Insider]
On driving costs down: The primary economic impact of AI will not be in creating novel consumer experiences, but in relentlessly driving costs down for enterprise operations. — Source: [Anchor]
On open validation: The emergence of highly capable, open-weight models trained efficiently serves as a massive validation that the open-source approach can compete with heavily capitalized private labs. — Source: [Business Insider]
On model footprints: If a model requires three or four GPUs to run effectively, it violates the core engineering philosophy of building lightweight, deployable intelligence. — Source: [YouTube]

Part 5: Independence and Sovereign AI

On sovereign control: "Sovereignty and independence is the most important thing in the world... If we can help countries become more sovereign, have more autonomy... that’s a success." — Source: [Anchor]
On avoiding vendor lock-in: "If you're a large enterprise, you don't want to be completely locked in to your cloud provider, one ecosystem. Because they're just going to keep squeezing you for margin over time." — Source: [Upstarts Media]
On owning vs. renting: "Enterprises no longer want to rent AI — they want to own it." — Source: [Business Insider]
On true open source: To push for democratic technology, it is essential to release powerful models under permissive licenses like Apache 2.0, moving beyond merely offering "open weights." — Source: [VentureBeat]
On data privacy as a blocker: "Enterprises must put privacy first if models are to touch more and more sensitive data. That’s something that will unlock usage in enterprises because right now, they’re hesitant to build systems that touch sensitive data." — Source: [Business Insider]
On private deployments: "For Cohere, we deploy completely privately within a VPC or on-premises. Only you can see your data." — Source: [McKinsey]
On the risk of shadow AI: "If you don’t give them secure access, they’re going to get insecure access from just using a consumer service... and then that exposes that data to all future users." — Source: [YouTube]
On hard mode independence: Building an AI company without taking massive early checks from major cloud providers is "hard mode," but it is the only way to maintain strategic independence. — Source: [Upstarts Media]
On cloud agnosticism: "We’re not locked into one cloud... it’s available everywhere, even on-prem. We bring AI to wherever data actually lives." — Source: [Business Insider]

Part 6: Perspectives on AGI and "AI Religion"

On cosplaying religion: "I never liked the vibes of the whole AGI... it felt like cosplaying, it felt like people were larping a new religion frankly... I don't like that ethos of 'creating God'." — Source: [YouTube]
On the doomerism debate: "Discussing the AI threat to human existence is an absurd use of our time." — Source: [Amazon AWS]
On the AGI timeline: "AGI means a lot of things to a lot of different people... It’s not a binary. It’s not discrete; it’s continuous. We’re already quite far along that road." — Source: [YouTube]
On the five-year horizon: "Over the next five years, I think we'll be able to automate any human task that we decide we don't want to do. Just as the steam engine took the physical load off our backs, AI will do the same for the cognitive load." — Source: [McKinsey]
On existential risk narratives: "The fear that certain individuals and organizations espouse about this technology being a terminator... those are stories humanity has been telling itself for decades... It’s easier to capture people’s imagination and fear when you tell them that." — Source: [AP News]
On present capabilities: "I think we already have AGI to a large extent... the logical and actually correct answer is to have the model do it. I promise you it knows more than I do." — Source: [YouTube]
On real-world harms: The industry must shift its focus away from science fiction scenarios of rogue superintelligence and address immediate, practical harms like bias and workforce displacement. — Source: [Anchor]
On the mystery of intelligence: "For me, artificial intelligence is the last mystery of science. We understand our physical world so well, yet intelligence and consciousness are a total mystery." — Source: [McKinsey]
On saving the world pragmatically: "I want to save the world with AI. I want to put it to work to actually make healthcare better... that inspires me much more than building God." — Source: [YouTube]

Part 7: AI Agents, Tool Use, and RAG

On the burden of proof: "When the model responds, it’s citing back to a specific document... it has the burden of proof for why it’s responding. It has to say, 'I’m saying this because I read it over here'." — Source: [YouTube]
On the limit of text generation: "The model's impact on the world is a string of characters that's coming out. Once you're able to plug a model into APIs... it can actually go out and work for us. So they can affect change in the real world." — Source: [YouTube]
On supercharging intelligence with tools: "A model shouldn't have to manually memorize the product of two five-digit numbers; it should be able to just plug those into a calculator... giving our models access to a set of tools... is going to be supercharging for the model." — Source: [YouTube]
On the CPU analogy: A large language model functions effectively as a CPU for language, requiring peripheral tools to actually interact with and manipulate the outside environment. — Source: [YouTube]
On the step change of RAG: "With RAG, it’s like a step change drop [in hallucinations]." — Source: [YouTube]
On the necessity of citations: "Being able to audit these models, being able to explain why they said what they said... that’s going to be crucial to compliance in regulated industries." — Source: [YouTube]
On the agentic experience: "People don't realize how useful these agents can be if they're given full access, the same level that humans have. When you actually interact with a model that has everything, it feels magical." — Source: [Upstarts Media]
On RAG vs. long context: While expansive context windows are technically impressive, they cannot replace the necessity of Retrieval-Augmented Generation when dealing with petabytes of distributed enterprise data. — Source: [YouTube]
On teaching models to reason: "There were tasks pre-reasoning we just couldn't get a model accurate enough to accomplish... it's very much about teaching these models to reason through solving the problems that exist within business using the tools that humans in businesses use." — Source: [YouTube]
On building agent platforms: "It’s like an AI agent platform where you can build agents, plug those agents into all the software and data that the humans inside your organization have access to, and then ask them to go do things." — Source: [YouTube]

Part 8: Building Startups and Operating Philosophy

On assuming success: "Operate in a way that assumes success. Assume that you are going to succeed and work backward from there. Ask, 'What do I need to be doing today to reach that success?'" — Source: [Clay]
On trusting your instincts: "Trust yourself more." In a fast-growing startup, self-trust is the primary navigation tool through inevitable chaos. — Source: [Upstarts Media]
On building the airplane: Building a rapidly scaling AI company feels exactly like the cliché: "you're building the airplane as you're flying it." — Source: [Upstarts Media]
On resilience and failure: "For every one time you succeed, you fail relentlessly... But then occasionally, you don't fail. You succeed—and that is the most rewarding feeling." — Source: [Upstarts Media]
On protecting the board: "Our Board is the beating heart of the company, and protecting that relationship is extremely important." — Source: [Upstarts Media]
On returning to the open source community: "I come from the open-source community. I am a researcher of machine learning and I benefited massively from open research and the release and communication and distribution of knowledge." — Source: [Financial Times]
On playing providers off each other: "Strategic independence allows you to play the cloud providers off of each other," preventing a startup from being squeezed for margin by a single ecosystem. — Source: [Upstarts Media]
On the need for different regulations: "Enterprise AI companies need different rules from consumer-facing ones," as treating chat applications and core infrastructure under the same regulatory framework is fundamentally flawed. — Source: [Business Insider]
On long-term ownership: True success for a foundational AI startup involves eventually going public, allowing everyday people to own a stake rather than remaining perpetually locked in private equity or massive cloud provider portfolios. — Source: [Upstarts Media]

Lessons from Aidan Gomez

Part 1: The Transformer and Its Origins

Part 2: The Future of AI Architecture

Part 3: Enterprise AI and Business Strategy

Part 4: Efficiency and Hardware Pragmatism

Part 5: Independence and Sovereign AI

Part 6: Perspectives on AGI and "AI Religion"

Part 7: AI Agents, Tool Use, and RAG

Part 8: Building Startups and Operating Philosophy

Explore the surrounding system

Get the weekly briefing.

More profiles

Lessons from Jaya Gupta

Lessons from Will Brown

Lessons from Vincent Weisser