Lessons from Gavin Uberti

Gavin Uberti is the co-founder and CEO of Etched, a semiconductor startup building application-specific integrated circuits designed exclusively for transformer architectures. He left Harvard to develop the Sohu chip, operating on the thesis that hardwiring models into silicon bypasses complex software constraints and fundamentally reduces inference costs. This profile collects his perspectives on hardware economics, the limitations of general-purpose GPUs, and the technical requirements for real-time artificial intelligence.

Part 1: The Transformer Thesis

On the dominance of transformers: "The entire AI industry is converging on a single architecture, and that changes the fundamental math of hardware design." — Source: [Invest Like the Best]
On architectural consolidation: "We looked at the research and realized that almost every major breakthrough was simply a transformer applied to a new domain." — Source: [Etched Company Blog]
On the end of general-purpose AI: "If transformers are the future, building chips that do anything else is a waste of silicon." — Source: [TechCrunch Found Podcast]
On specialization: "ASICs historically win when algorithms stabilize. Transformers have stabilized enough to burn them into hardware." — Source: [Masters of Scale]
On the bet they are making: "Etched is a binary bet. If transformers are replaced tomorrow, we lose, but if they stay the standard, we build the most important chip in the world." — Source: [Invest Like the Best]
On model size versus architecture: "Models are getting larger and datasets are expanding, but the underlying mathematical operations remain remarkably consistent." — Source: [Power Law with John Coogan]
On the longevity of attention mechanisms: "The paper 'Attention is all you need' served as a literal blueprint for the next decade of silicon architecture." — Source: [Pioneers of AI Podcast]
On historical hardware trends: "Every computing paradigm eventually moves from CPUs to GPUs to ASICs. AI is currently entering its ASIC phase." — Source: [MIT Technology Review]
On algorithmic overhang: "People worry about a new architecture arriving, but the tooling and ecosystem lock-in around transformers makes displacing them incredibly difficult." — Source: [Invest Like the Best]
On the simplicity of the thesis: "We aren't betting on a specific model, we are betting on the fundamental math that powers them all." — Source: [Etched Company Blog]

Part 2: Challenging the Incumbents

On NVIDIA's generalist approach: "GPUs have to be good at rendering graphics, scientific simulation, and AI. That versatility is a tax on performance." — Source: [Invest Like the Best]
On wasted silicon: "When you run a transformer on a GPU, a massive percentage of the chip's real estate is completely turned off." — Source: [TechCrunch Found Podcast]
On the CUDA advantage: "NVIDIA's software moat is formidable. If you hardwire the model into the chip, however, you bypass the need for a complex compiler like CUDA entirely." — Source: [Masters of Scale]
On fighting Goliath: "You cannot beat NVIDIA at their own game. You have to play a different game by refusing to be general-purpose." — Source: [Power Law with John Coogan]
On market positioning: "We are trying to replace the GPU for inference, where the volume and economics matter most, rather than competing on training hardware." — Source: [Invest Like the Best]
On memory bandwidth: "GPUs are bottlenecked by how fast they can move data. By changing the architecture, we fundamentally shift the memory bandwidth constraints." — Source: [Etched Company Blog]
On incumbency inertia: "Large companies are trapped by their own success. NVIDIA cannot abandon its legacy architectures without alienating its base." — Source: [Pioneers of AI Podcast]
On hardware utilization: "We achieve an order of magnitude more performance simply by keeping our compute units fed one hundred percent of the time." — Source: [Invest Like the Best]
On generic compute: "Assuming GPUs will forever dominate AI is akin to assuming CPUs would forever dominate graphics." — Source: [MIT Technology Review]
On taking the hard path: "Building a new chip is hard, but trying to optimize generic hardware for highly specific workloads is a dead end." — Source: [TechCrunch Found Podcast]

Part 3: The Architecture of Sohu

On design philosophy: "Sohu is designed with one single objective to run transformer inference faster and cheaper than the physical limitations of a standard GPU allow." — Source: [Etched Company Blog]
On removing control flow: "Because we know exactly what operations are coming next in a transformer, we strip out the complex control flow logic that clutters standard processors." — Source: [Invest Like the Best]
On matrix multiplication: "We built a giant matrix multiplication engine with a memory hierarchy tuned specifically for the sizes used in large language models." — Source: [Power Law with John Coogan]
On power efficiency: "When you don't have to power unneeded silicon, the energy cost per token plummets." — Source: [Masters of Scale]
On chip sizing: "We maximize the die area dedicated to pure arithmetic logic units, which explains why our throughput is fundamentally higher." — Source: [Pioneers of AI Podcast]
On tree search in silicon: "Sohu is capable of parallelizing generation in ways that allow for massive tree search, which is how we enable complex agentic workflows." — Source: [Invest Like the Best]
On the manufacturing process: "We use standard TSMC nodes. The magic resides entirely in the logical layout of the chip, rather than a novel manufacturing process." — Source: [TechCrunch Found Podcast]
On memory walls: "Everyone hits a memory wall in inference. We designed our SRAM layout specifically to hold the KV cache closer to the compute core." — Source: [Etched Company Blog]
On speculative decoding: "Our architecture natively supports advanced techniques like speculative decoding because the hardware understands the strict structure of the model." — Source: [Invest Like the Best]
On scale-out networking: "A single chip is never enough. We designed the interconnects to allow scaling across racks seamlessly without adding software overhead." — Source: [Power Law with John Coogan]

Part 4: Software and Complexity

On the compiler tax: "Writing kernels to squeeze performance out of GPUs takes thousands of engineers. We bypass the compiler completely." — Source: [TechCrunch Found Podcast]
On the history of TVM: "My time working on Apache TVM taught me that software optimization can only take you so far before you hit the hardware's permanent ceiling." — Source: [Invest Like the Best]
On hardwiring software: "We take what normally requires millions of lines of compiler code and literally etch it into the silicon layout." — Source: [Etched Company Blog]
On the developer experience: "Users shouldn't have to rewrite their models for our chip. If it's a transformer, it runs out of the box." — Source: [Masters of Scale]
On ignoring legacy support: "By refusing to support anything other than transformers, we eliminated the vast majority of complexity in our software stack." — Source: [Invest Like the Best]
On kernel development: "Startups die trying to write custom kernels for every new model variation. Our hardware natively understands the fundamental operations." — Source: [Power Law with John Coogan]
On the PyTorch ecosystem: "We hook directly into the frameworks people already use so that the underlying hardware remains invisible to the researcher." — Source: [Pioneers of AI Podcast]
On software abstractions: "Software abstractions on GPUs are leaky because the hardware wasn't built for the task. We align the abstraction directly with the metal." — Source: [MIT Technology Review]
On open-source models: "The proliferation of open-weight models means inference hardware can target standard architectures without needing to know proprietary model weights." — Source: [Invest Like the Best]

Part 5: The Economics of Inference

On the bottleneck of AI: "Training is a fixed cost, but inference is an ongoing operational expense. Inference is where the unit economics of AI will be decided." — Source: [Masters of Scale]
On cost per token: "To make AI ubiquitous, the cost per token needs to drop by multiple orders of magnitude, which general GPUs simply cannot achieve." — Source: [Invest Like the Best]
On compute constraints: "Right now, product design is constrained entirely by how much compute developers can afford to use per user interaction." — Source: [TechCrunch Found Podcast]
On the data center footprint: "We can replace multiple server racks of GPUs with a single server of Sohu chips, fundamentally altering data center economics." — Source: [Etched Company Blog]
On capital intensity: "Raising $120 million sounds like a lot, but in silicon, it's the bare minimum required to get a competitive chip manufactured at scale." — Source: [Power Law with John Coogan]
On margin structures: "Software companies are used to high margins. In AI, compute costs are destroying those margins, and better hardware is the only fix." — Source: [Invest Like the Best]
On total cost of ownership: "Hyperscalers care about total cost above everything. If we lower the energy and footprint while increasing throughput, the math forces them to switch." — Source: [Pioneers of AI Podcast]
On democratizing access: "Cheaper inference is fundamentally about enabling a developer in a dorm room to run autonomous agents at scale." — Source: [MIT Technology Review]
On the hardware lifecycle: "Because the transformer architecture is stable, our chips won't become obsolete in twelve months like standard hardware cycles." — Source: [TechCrunch Found Podcast]

Part 6: Building a Hardware Startup

On the difficulty of silicon: "In software, you can push a patch on Friday. In hardware, a mistake means you burn millions of dollars and lose a year." — Source: [Masters of Scale]
On pacing the team: "You have to run a hardware startup with software startup urgency, which breaks a lot of traditional engineering norms." — Source: [Invest Like the Best]
On hiring criteria: "We intentionally hire engineers who question why chip design has remained slow for the last twenty years, rather than only seeking industry veterans." — Source: [Power Law with John Coogan]
On tape-out pressure: "The moment you finalize the design for tape-out is the most stressful day in a hardware founder's life." — Source: [TechCrunch Found Podcast]
On IP blocks: "We use standard memory and interconnect IP so we can focus all our innovation exclusively on the compute core." — Source: [Etched Company Blog]
On building credibility: "When you are a team of college dropouts building silicon, the only way to get taken seriously is to show the math. The math is undeniable." — Source: [Invest Like the Best]
On investor skepticism: "Venture capitalists traditionally avoid hardware because of the capital requirements and long feedback loops. We had to prove this was a different paradigm." — Source: [Pioneers of AI Podcast]
On focus: "A startup's biggest advantage is singularity of focus. Nvidia has a thousand product lines while we have exactly one." — Source: [MIT Technology Review]
On co-founder dynamics: "Building Etched requires a blend of deep technical conviction and aggressive capital allocation. Robert and I divide that labor fiercely." — Source: [Power Law with John Coogan]

Part 7: Leaving University and Taking Risks

On timing: "The window to build the foundational hardware for the AI era is open right now. It won't be open in three years when I would have graduated." — Source: [TechCrunch Found Podcast]
On the decision to drop out: "Harvard is great, but the opportunity cost of sitting in a classroom while the biggest platform shift in history happens was simply too high." — Source: [Masters of Scale]
On academic versus real-world speed: "Academia teaches you to minimize errors. Startups require you to maximize velocity, even if things break in the process." — Source: [Invest Like the Best]
On conviction: "You have to believe in your thesis enough to walk away from a guaranteed path. For us, the transformer thesis was that strong." — Source: [Power Law with John Coogan]
On youth as an advantage: "Not knowing exactly how hard something is supposed to be is often the only reason you are willing to attempt it." — Source: [Pioneers of AI Podcast]
On early validation: "Closing our seed round was the market confirming that our high-risk idea was actually viable." — Source: [Etched Company Blog]
On the Thiel Fellowship model: "There is a lineage of young founders leaving school for massive shifts. We view AI hardware as the ultimate platform shift." — Source: [Forbes 30 Under 30 Profile]
On missing college life: "Building something that could alter the trajectory of computing is a completely fair trade for missing senior year." — Source: [Invest Like the Best]
On risk asymmetry: "The worst-case scenario was we fail and go back to school. The best-case scenario is we build the engine of the future." — Source: [MIT Technology Review]

Part 8: The Future of Real-Time AI

On the latency barrier: "Right now, talking to an AI feels like talking over a laggy radio. We are building the hardware to make it feel like talking to a human in the room." — Source: [Invest Like the Best]
On autonomous agents: "Agents require thousands of internal loops and reasoning steps before they act. You cannot do that economically without specialized hardware." — Source: [TechCrunch Found Podcast]
On new product categories: "Cheap, instantaneous inference will create products we haven't even conceived of yet, because developers won't be constrained by compute budgets." — Source: [Masters of Scale]
On multimodal futures: "Video generation and real-time voice translation are just math problems, and we are building the ultimate calculator for them." — Source: [Etched Company Blog]
On systemic efficiency: "If AI is going to manage our grids and run our software, the physical infrastructure it runs on cannot consume the power of small nations." — Source: [Power Law with John Coogan]
On the speed of thought: "The goal is to generate text and code faster than humans can read it, making the AI an instantaneous extension of the user." — Source: [Pioneers of AI Podcast]
On real-time search: "Searching the internet will shift from retrieving documents to synthesizing thousands of documents in milliseconds." — Source: [Invest Like the Best]
On AI replacing software: "Eventually, we won't write deterministic code. We will run continuous inference loops, which requires massive hardware throughput." — Source: [MIT Technology Review]
On the long-term vision: "Our objective extends beyond building a faster chip; we are working to lower the baseline cost of machine intelligence." — Source: [TechCrunch Found Podcast]

Lessons from Gavin Uberti

Part 1: The Transformer Thesis

Part 2: Challenging the Incumbents

Part 3: The Architecture of Sohu

Part 4: Software and Complexity

Part 5: The Economics of Inference

Part 6: Building a Hardware Startup

Part 7: Leaving University and Taking Risks

Part 8: The Future of Real-Time AI

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Zeynep Tufekci

Lessons from Winston Weinberg

Lessons from William Rosenberg