Lessons from Soumith Chintala

Soumith Chintala co-created PyTorch, the software framework beneath most modern AI research. He won the field by prioritizing the researcher over raw machine efficiency, proving that complex tools must be easy to debug to be useful. This piece details his approach to managing open-source communities, surviving hardware wars, and deciding which trends to ignore.

Part 1: The Design of PyTorch & Software Engineering

On Explicit Building Blocks: "PyTorch favors explicit building blocks that are easy to understand and debug. While explicit design requires more code, it makes the program's behavior predictable and errors easier to trace." — Source: [PyTorch Design Philosophy]
On Simplicity over Ease: "There is a difference between simple and easy. Easy APIs hide complexity but are impossible to debug when they fail; simple APIs expose the logic so the user understands what is actually happening." — Source: [PyTorch Design Philosophy]
On Product Moats: "Just having that clear thesis of what you're betting on... you don't just try to do everything, you just bet on some things and you're hoping that your bet also is true and that amplifies the structural advantage to your product." — Source: [Gradient Dissent Podcast]
On Python Integration: "We wanted to build a tool that feels like Python, where you can just use a standard debugger and see what's happening inside the model." — Source: [Latent Space Podcast]
On the Cost of Flexibility: "If you build a framework that absolutely optimizes for a modeler to be flexible and do their work most productively... it will come at a cost with making backend engineers happier." — Source: [Gradient Dissent Podcast]
On Avoiding Restriction-First Regimes: "The framework avoids 'restriction-first' regimes like mandatory static shapes or graph-mode only execution that might offer performance gains but limit a researcher's ability to experiment." — Source: [PyTorch Design Philosophy]
On Reasonable Performance: "The goal is to provide 'best-in-class' performance without sacrificing the user experience. If a performance optimization introduces significant user friction, it should be rejected or made optional." — Source: [PyTorch Design Philosophy]
On Being Pragmatic with Code: "You have to be absolutely stubborn on the principles and philosophy that you're driving, but everything else is changeable." — Source: [PyTorch Developer Day]
On Tooling as Intellectual Infrastructure: "Approximately 70% of AI developers use PyTorch without realizing it, as it has become the intellectual infrastructure embedded within higher-level abstractions and libraries." — Source: [Scaling AI Infrastructure Interview]
On Dynamic Computational Graphs: "Unlike early versions of TensorFlow that used static 'define-then-run' graphs, dynamic 'define-by-run' graphs allow the network structure to change at runtime, which is essential for NLP and reinforcement learning." — Source: [PyTorch Design Philosophy]

Part 2: The Eager Mode & User Experience

On Subjective Metrics: "There is no objective metric for 'good user experience.' While you can track downloads and citations, the most valuable product iterations come from subjective feedback and listening to users." — Source: [Gradient Dissent Podcast]
On Measuring Success: "To me, the metrics we use are more of a tailwind or sanity check... but we don't really use them to inform our development cycle or even incentivize our success." — Source: [Gradient Dissent Podcast]
On Developer Paging: "Prioritize reducing 'paging'—the mental cost of switching contexts. Keep systems simple, even if complexity offers a slight theoretical upside." — Source: [Soumith's Personal Blog]
On Fast Time-to-Output: "A core tenet of good tooling is fast time-to-first-output. For large projects, the ability to iterate quickly is more important than having a perfect grand vision from day one." — Source: [Soumith's Personal Blog]
On Bridging Research and Production: "Provide a path to production via tools like TorchScript and torch.compile without forcing researchers to change their coding style prematurely." — Source: [PyTorch Developer Conference]
On Building for the Modeler: "PyTorch’s core thesis was to put the researcher—the 'modeler'—at the center of the universe." — Source: [Gradient Dissent Podcast]
On Dealing with Errors: "When things fail, the system's internal logic should not be opaque. If a user has to explicitly move tensors, they immediately know where a memory error occurred." — Source: [PyTorch Design Philosophy]
On Torch Dynamo: "Acquiring arbitrary Python code and converting it into an optimized format without breaking the 'eager' user experience is a major technical breakthrough." — Source: [Gradient Dissent Podcast]
On Elegance in Tools: "Tools shouldn't just work; they should feel elegant and joyful for the people spending 12 hours a day inside them." — Source: [Meta FAIR Retrospective]

Part 3: The Philosophy of Open Source

On Distributing Opportunity: "I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful." — Source: [Latent Space Podcast]
On Decentralizing Knowledge: "Knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for zero dollars." — Source: [Latent Space Podcast]
On the Coordination Problem: "Open source always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source." — Source: [Latent Space Podcast]
On the "Switzerland" Strategy: "Open source acts as a neutral Switzerland for the industry—a collaborative layer that prevents any single corporation from monopolizing AI development." — Source: [Latent Space Podcast]
On Maintainer Responsibility: "Once you become a maintainer, it's kind of a responsibility. You're really focusing on growing the ecosystem as well. It's not just the code stuff; it's everything—go evangelize it, give talks, all of that stuff." — Source: [Gradient Dissent Podcast]
On Community Limits: "The core team cannot build everything. Provide the fundamental building blocks and empower the community to build specialized libraries on top of them." — Source: [PyTorch Developer Day]
On Listening to the Top-Heavy Users: "A small, dedicated team engaging with 'top-heavy' users—those who bring disproportionate value—is more effective than broad, impersonal metrics." — Source: [Gradient Dissent Podcast]
On the Necessity of Open Weight Models: "Open-source models and frameworks are essential for safety, transparency, and rapid innovation." — Source: [Meta AI Keynote]
On Empowering Rockstars: "Manage teams by empowering 'rockstars' found online. Often the best hires are people you have only interacted with through GitHub or blog posts." — Source: [Meta FAIR Retrospective]
On True AI Accessibility: "AI is delicious when it is accessible and open-source." — Source: [Latent Space Podcast]

Part 4: AI Hardware & The Infrastructure Layer

On the CUDA Moat: "CUDA is a 'software prison' for those locked into NVIDIA's ecosystem. While it is the gold standard for stability, the cost of switching is what truly keeps companies there." — Source: [Reddit AMA]
On Hardware Vulnerability: "PyTorch is actually more vulnerable to disruption than NVIDIA's CUDA. While CUDA is tied to physical silicon, PyTorch is a software standard that must constantly adapt to an explosion of hardware accelerators." — Source: [Scaling AI Infrastructure Interview]
On the Talent Gap in Silicon: "I've worked with many hardware companies over the years and it's easy to see the talent and direction gap between NVIDIA and the rest." — Source: [X / Twitter post]
On Training vs. Inference Hardware: "When it comes to inference and fine-tuning, NVIDIA's advantage in software won't hold much significance. They will inevitably have to face competition from AMD and custom silicon in terms of performance per dollar." — Source: [Reddit AMA]
On King-making: "We very much don't want to king-make the hardware side of things... if users are using a particular piece of hardware, then we want to support it." — Source: [Latent Space Podcast]
On Shedding the Moat: "Tools like OpenAI’s Triton allow frameworks to generate high-performance kernels without being tied exclusively to CUDA C++, slowly shedding the hardware moat." — Source: [Latent Space Podcast]
On Local Hardware Spend: "Suggest using colab and cloud compute sources rather than blowing money on hardware you may rarely use. In a year these devices will drop in price significantly." — Source: [Reddit Advice to Beginners]
On the GPU Rich: "Even the GPU Rich are GPU Poor. The insatiable demand for compute exists at every level, even at companies like Meta." — Source: [Latent Space Podcast]
On GPU Preference: "I personally prefer having rather many small GPUs than one big one, even for my research experiments." — Source: [Reddit AMA]

Part 5: Engineering Management & Strategy

On the AI Fixer Mindset: "Focus on identifying where the AI industry or your internal infrastructure is stuck, and apply the necessary technical or organizational fix to unblock it." — Source: [Meta FAIR Retrospective]
On Automating the Tedious: "Embrace laziness by automating everything you don't want to do." — Source: [Soumith's Personal Blog]
On Toy Complexity: "Avoid engineers who build up their own 'toy complexity' to solve useless problems. Ground engineering work in applications with obvious, real-world benefits." — Source: [Soumith's Personal Blog]
On Navigating Bureaucracy: "A leader's role in a large corporation is to act as a buffer, navigating the bureaucracy so the team can stay focused purely on building." — Source: [Gradient Dissent Podcast]
On Intrinsic Motivation: "Long-term success and happiness in engineering are correlated with intrinsic motivation rather than extrinsic corporate rewards." — Source: [Meta FAIR Retrospective]
On Doing Less to Achieve More: "If the open-source community does more, then the core team needs to do less. Leverage external contributors to maintain a small, fast core." — Source: [PyTorch Developer Day]
On Sunk Costs in Academia: "Academia often suffers from sunk costs and inertia. Established researchers resist new methods until a breakthrough forces a shift in perspective." — Source: [Gradient Dissent Podcast]
On Institutional Agility: "The primary advantage of a small team is not brainpower, but the absence of coordinating meetings." — Source: [Thinking Machines Lab Announcement]
On the Value of Small Projects: "Sometimes you need to leave the massive leverage of a giant corporation to do something small again and test your own limits." — Source: [Thinking Machines Lab Announcement]
On Engineering Ground Truth: "You can't manage an infrastructure project through dashboards. You have to write the code yourself periodically to know where it hurts." — Source: [Latent Space Podcast]

Part 6: Navigating AI Research & Hype

On Recognizing Fads: "There are genuinely fads that come and go. I stopped working on GANs at some point because I just thought they were so unstable and it was not clear that they could be stabilized." — Source: [Gradient Dissent Podcast]
On Latent Spaces: "The generator exhibits linear properties in its latent space, enabling vector arithmetic operations on visual concepts." — Source: [DCGAN Paper (2015)]
On Model Instability: "I was scared more about the fact that CERN was using GANs than that they were using PyTorch, because I knew exactly how unstable GANs could be." — Source: [Latent Space Podcast]
On the SF Tech Bubble: "A lot of the narratives around AGI are just SF AGI-party talking points. Real-world AI diffusion and policy require a much deeper understanding of logistics." — Source: [Critique of Jensen Huang Interview]
On the Bitter Lesson: "Rich Sutton’s 'Bitter Lesson'—that compute always wins—may face friction against geopolitical constraints and the need for private data control, forcing a shift toward more data-frugal research." — Source: [Scaling AI Infrastructure Interview]
On Scaling Laws vs Architecture: "You still need fundamental architecture breakthroughs. You can't just scale a bad architecture indefinitely and expect intelligence." — Source: [PyTorch Developer Conference]
On Deep Learning Inertia: "It took a sledgehammer like AlexNet to break the institutional inertia that kept academia from embracing deep learning." — Source: [Gradient Dissent Podcast]
On High Information Density: "Blackboard lectures and whiteboarding the math behind scaling are far more valuable than high-level philosophical debates about consciousness." — Source: [X / Twitter post]
On Over-Optimizing Benchmarks: "If you build exclusively to win a benchmark, you will invariably ruin the experience of actually using the tool to discover new things." — Source: [PyTorch Design Philosophy]

Part 7: On Career, Learning & Motivation

On Accidental Careers: "I think I just happened to like something that is now one of the coolest things in the world... the first thing I tried to become was a 3D VFX artist, but I turned out to be very bad at it." — Source: [Latent Space Podcast]
On Reading the Ecosystem: "Probably the highest-leverage 45 mins I spend everyday is catching up with what's going on in AI through focused podcasts and papers." — Source: [Latent Space Podcast]
On Daily Grinding: "In the early days of a project, the unglamorous work is everything. I was reading 500 notifications a day across Twitter and forums to manually debug user problems." — Source: [Gradient Dissent Podcast]
On Staying Technical: "The moment you stop writing code and only review architectures, you lose your intuition for what makes a tool fast or frustrating." — Source: [Meta FAIR Retrospective]
On Choosing Projects: "Work on things that have a high probability of helping your immediate peers. If the people next to you find it useful, it will likely scale." — Source: [Soumith's Personal Blog]
On Knowing When to Quit a Research Path: "If you spend years patching the instability of a model and the fundamentals don't improve, it's a signal to abandon that architecture." — Source: [Gradient Dissent Podcast]
On Dealing with Legacy: "Maintaining legacy systems like Torch7 taught us what not to do when we started building the next iteration." — Source: [PyTorch Developer Day]
On Personal Reach: "Leverage isn't just about how many servers you control; it's about how much friction you remove for other smart people." — Source: [Meta FAIR Retrospective]
On Starting Over: "Leaving an 11-year seat of power to join a startup is necessary to prove to yourself that your skills are sharp, not just your platform." — Source: [Thinking Machines Lab Announcement]

Part 8: The Decentralized Future of AI

On Monopoly Mechanics: "I don't subscribe to the 'one model to rule them all' narrative. Human organizational limitations—not technical ones—will prevent mega-corporations from dominating every single AI vertical." — Source: [Scaling AI Infrastructure Interview]
On Local Deployments: "Niche applications and local deployments will drive a massive, decentralized ecosystem that no single entity can control." — Source: [Scaling AI Infrastructure Interview]
On the Diffusion of AI: "The true impact of AI won't be measured by the largest model in a data center, but by how deeply it penetrates standard, boring enterprise software." — Source: [Critique of Jensen Huang Interview]
On Data Frugality: "Because companies will want to keep their data private, there will be massive economic incentives to figure out how to train smart models on less data." — Source: [Scaling AI Infrastructure Interview]
On Robotics and AI: "The next frontier isn't just text on a screen. Integrating models with physical robotics requires solving latencies that cloud API calls cannot tolerate." — Source: [Thinking Machines Lab Announcement]
On the Neutral Layer: "The entire AI ecosystem benefits when the lowest layer of execution is a public good, much like Linux was for the internet." — Source: [Latent Space Podcast]
On Open Source vs. Closed Source Velocity: "A closed team can move faster for the first year. But by year three, the collective intelligence of the open internet will always outpace a single building of engineers." — Source: [PyTorch Developer Day]
On Institutional Trust: "Trust in AI systems will ultimately stem from code you can compile yourself, not API endpoints you have to blindly query." — Source: [Latent Space Podcast]
On the Real Future of AGI: "I care less about creating a god-like intelligence and more about creating tools that make every human significantly more capable at their specific craft." — Source: [Gradient Dissent Podcast]

Lessons from Soumith Chintala

Lessons from Soumith Chintala

Part 1: The Design of PyTorch & Software Engineering

Part 2: The Eager Mode & User Experience

Part 3: The Philosophy of Open Source

Part 4: AI Hardware & The Infrastructure Layer

Part 5: Engineering Management & Strategy

Part 6: Navigating AI Research & Hype

Part 7: On Career, Learning & Motivation

Part 8: The Decentralized Future of AI

Get the next notes and essays.

More profiles

Lessons from Alex Sacerdote

Lessons from Paul Desmarais Jr.

Lessons from Michele Romanow

Explore the surrounding system