Lessons from Jim Keller

Jim Keller is a microprocessor engineer who designed core architectures for AMD, Apple, and Tesla by favoring clean-slate designs over incremental tweaks. He argues that breaking performance plateaus requires periodically throwing out old architectures to start from scratch. This collection outlines his technical and management strategies for engineers building complex systems.

Part 1: The Philosophy of Computer Architecture

  1. On the essence of architecture: "Computer architecture is fundamentally about predicting the future of software and building hardware that will be ready when it arrives." — Source: [Lex Fridman Podcast #70]
  2. On instruction execution: "Despite the incredible complexity of modern processors, about 90% of execution time is spent on just 25 instructions." — Source: [AnandTech Interview]
  3. On simplicity versus complexity: "The most elegant designs emerge only after stripping away layers of accumulated legacy features that were added to solve yesterday's problems." — Source: [TechTechPotato]
  4. On backward compatibility: "Maintaining support for every legacy instruction creates a tax on power and area that eventually limits future performance leaps." — Source: [EE Times]
  5. On predictability: "A good architecture moves data predictably, which is often more important for overall system performance than peak theoretical output." — Source: [Tenstorrent TT-Deploy Keynote]
  6. On the role of the architect: "The architect's job is to arbitrate the competing demands of the software, the power budget, and the physical limits of the silicon." — Source: [Lex Fridman Podcast #70]
  7. On data movement: "In modern systems, compute is almost free. The real challenge, and the real cost in power, is simply moving data from where it is stored to where it is processed." — Source: [EE Times]
  8. On heterogeneous computing: "The future requires specialized engines for specific tasks, carefully coordinated, rather than a single massive core handling everything." — Source: [Creative Destruction Lab]
  9. On vector math: "As transistor budgets grow, the fundamental unit of computation shifts from scalar operations to massive vector and matrix calculations." — Source: [FII Priority Summit]
  10. On designing for the compiler: "Hardware is useless if the software cannot target it. The best hardware architectures are co-designed closely with the compilers that will feed them." — Source: [Lex Fridman Podcast #162]

Part 2: Breaking Through Diminishing Returns

  1. On incremental improvements: "You can often find an easy 10% performance gain by tweaking an existing design, but eventually, this iterative process just adds bloat and power consumption." — Source: [Lex Fridman Podcast #70]
  2. On the five-year rule: "To achieve a massive leap in performance, you generally have to throw out the old architecture and start from a clean slate every three to five years." — Source: [AnandTech Interview]
  3. On the local maximum: "Engineering teams naturally optimize for the local maximum. Breaking out of it requires top-down leadership willing to reset the foundation." — Source: [More Than Moore]
  4. On legacy bloat: "Adding execution units and larger caches yields diminishing returns over time. The chip eventually becomes a massive traffic jam of its own making." — Source: [TechTechPotato]
  5. On the danger of comfort: "When an engineering team gets too comfortable with a successful architecture, they stop looking for the radical changes needed for the next generation." — Source: [Tenstorrent TT-Deploy Keynote]
  6. On refactoring hardware: "Just as software engineers must occasionally refactor their codebase, hardware engineers must periodically redesign the processor pipeline from the ground up." — Source: [EE Times]
  7. On uncovering bottlenecks: "Once you solve the primary bottleneck in a system, a new, previously hidden bottleneck will immediately become the limiting factor." — Source: [Lex Fridman Podcast #162]
  8. On the value of constraints: "Strict power and thermal limits often force the most creative and efficient architectural decisions." — Source: [FII Priority Summit]
  9. On clean-slate design: "The fear of breaking things prevents clean-slate designs. Without breaking things, you cannot achieve generational leaps." — Source: [Creative Destruction Lab]

Part 3: The Reality of Moore's Law

  1. On the death of Moore's Law: "Moore's Law is a battle cry and a continuous observation of exponential human ingenuity, rather than a strict rule about transistor scaling." — Source: [Lex Fridman Podcast #70]
  2. On the definition of scaling: "People define Moore's Law too narrowly by focusing only on lithography. True scaling comes from thousands of micro-innovations across materials, packaging, and architecture." — Source: [Creative Destruction Lab]
  3. On physics limits: "We are nowhere near the fundamental physical limits of computation. There is still a vast amount of room at the bottom for atomic-scale engineering." — Source: [Lex Fridman Podcast #162]
  4. On the illusion of stopping: "Every decade, experts claim we have hit the physical wall of silicon. Every decade, a new engineering breakthrough proves them wrong." — Source: [More Than Moore]
  5. On transistor economics: "While the cost of leading-edge wafers has increased, the overall economic value delivered by new architectures continues to scale exponentially." — Source: [EE Times]
  6. On 3D packaging: "The next dimension of Moore's Law involves stacking logic and memory vertically to reduce distance, moving beyond flat planar scaling." — Source: [Tenstorrent Keynote]
  7. On the economics of progress: "Moore's Law is as much an economic law as a technical one. The expectation of better chips funds the R&D to create them." — Source: [Lex Fridman Podcast #70]
  8. On the innovation pipeline: "There are dozens of viable technologies in the lab right now that will guarantee computing performance continues to scale for the next twenty years." — Source: [TechTechPotato]
  9. On the ingenuity of engineers: "The true driver of Moore's Law is the refusal of thousands of engineers to accept that a problem is unsolvable." — Source: [Creative Destruction Lab]

Part 4: AI Hardware and the Future of Compute

  1. On the nature of AI inference: "AI inference is fundamentally a networking and memory bandwidth challenge, rather than strictly a compute problem." — Source: [Tenstorrent TT-Deploy Keynote]
  2. On GPU overhead: "Modern GPUs carry massive hidden overheads in hardware scheduling and complex memory management that reduce the actual compute efficiency of the silicon." — Source: [EE Times]
  3. On software-defined data movement: "Removing hardware schedulers and letting the compiler handle data movement achieves significantly higher efficiency for AI workloads." — Source: [Tenstorrent Keynote]
  4. On the AI hype cycle: "The industry is currently in a hype cycle of building hyper-specialized accelerators. History shows that simpler, more general-purpose scaling eventually wins." — Source: [TechTechPotato]
  5. On tensor splitting: "The key to scaling AI models is an architecture that can seamlessly split and route tensors across massive clusters of chips without bottlenecks." — Source: [FII Priority Summit]
  6. On the cost of AI: "Hardware must become drastically cheaper and more programmable for AI to reach its full potential." — Source: [Lex Fridman Podcast #162]
  7. On neural network flexibility: "Hardware designers need to build chips that mimic the flexibility of neural networks, rather than forcing neural networks to adapt to rigid hardware." — Source: [Tenstorrent TT-Deploy Keynote]
  8. On accelerator limits: "Many current AI chips are over-designed for specific models and will become obsolete the moment the dominant neural network architectures change." — Source: [More Than Moore]
  9. On feeding the cores: "You can put as many teraflops on a chip as you want, but if you cannot feed them with data fast enough, it is just wasted silicon." — Source: [EE Times]
  10. On the brain as a model: "Re-engineering the human brain's efficiency remains a target for AI hardware, though we operate orders of magnitude away from its power efficiency." — Source: [Lex Fridman Podcast #162]

Part 5: Abstraction Layers and Complexity

  1. On the necessity of abstraction: "Computing only works because we build abstraction layers that hide the underlying complexity from the layer above it." — Source: [Lex Fridman Podcast #70]
  2. On leaky abstractions: "The biggest system failures occur when an abstraction layer leaks, forcing a software developer to understand the physical quirks of the hardware." — Source: [Creative Destruction Lab]
  3. On managing complexity: "You cannot manage the complexity of a modern microprocessor by understanding every transistor. You must trust the defined interfaces between components." — Source: [Tenstorrent Keynote]
  4. On the computing stack: "From quantum physics to high-level software, the entire computing stack is a series of agreed-upon lies that make engineering possible." — Source: [Lex Fridman Podcast #162]
  5. On the cost of features: "Every new feature adds complexity, and complexity is the enemy of reliability and performance scaling." — Source: [EE Times]
  6. On the power of simplicity: "The most profound engineering breakthroughs usually involve finding a way to collapse multiple complex layers into a single, simple, and elegant abstraction." — Source: [TechTechPotato]
  7. On hardware-software co-design: "The boundary between hardware and software should blur to optimize for the whole system instead of isolating the parts." — Source: [AnandTech Interview]
  8. On the trap of specialization: "Over-specializing an abstraction layer for a specific use case often ruins its utility for future, unforeseen applications." — Source: [More Than Moore]
  9. On the foundation of logic: "At the very bottom, beneath the software and the architecture, it all comes down to switches turning on and off." — Source: [Lex Fridman Podcast #70]

Part 6: Building and Leading Engineering Teams

  1. On interdisciplinary teams: "Computers are designed by teams of people with vastly different skill sets. A successful team requires many different kinds of thinkers." — Source: [Lex Fridman Podcast #70]
  2. On belief in success: "When taking on an entrenched competitor, the first and hardest job of a leader is convincing the team that they can actually win." — Source: [Lex Fridman Podcast #162]
  3. On order versus chaos: "As you increase order in an organization, productivity rises, but if you mandate too much order, productivity plummets because innovation requires a degree of chaos." — Source: [TechTechPotato]
  4. On the force of bureaucracy: "Once a company starts moving toward order and process, the force vector driving that bureaucracy becomes almost unstoppable." — Source: [More Than Moore]
  5. On empowering engineers: "The best leaders clearly define the problem and give the team the resources to solve it, leaving the exact technical implementation to the engineers." — Source: [Tenstorrent Keynote]
  6. On the value of dissent: "A healthy engineering culture encourages aggressive technical arguments. If everyone agrees on the architecture on day one, it is probably not ambitious enough." — Source: [AnandTech Interview]
  7. On hiring talent: "You hire for the underlying ability to adapt to problems that haven't been discovered yet, rather than hiring strictly for specific existing skills." — Source: [EE Times]
  8. On rebuilding culture: "Changing a company's technical trajectory requires simultaneously rebuilding its technical architecture and its social architecture." — Source: [FII Priority Summit]
  9. On the role of the manager: "A manager's job is to protect the engineers from the corporate noise so they can focus entirely on the design." — Source: [Lex Fridman Podcast #70]
  10. On the necessity of excellence: "When you look at the layout of a processor, it has to be great. You cannot make it great if you do not fundamentally know what you are doing." — Source: [Creative Destruction Lab]

Part 7: Instruction Sets and Open Standards

  1. On the RISC-V advantage: "RISC-V allows companies to build custom, flexible processors and integrate them with AI accelerators without being locked into a rigid, proprietary stack." — Source: [Tech Threads Podcast]
  2. On instruction length: "Fixed-length instructions seem nice for simple computers, but when you build massive processors, the instruction length does not matter that much." — Source: [Tenstorrent Keynote]
  3. On open versus closed ecosystems: "Open standards win by harnessing the collective intelligence of the entire industry instead of relying on the budget of a single company." — Source: [EE Times]
  4. On the chiplet revolution: "The shift to chiplets and multi-die packaging will be messy in the short term, but it is absolutely necessary for the future of scaling." — Source: [More Than Moore]
  5. On the commoditization of cores: "The CPU core is increasingly becoming a commodity block. The real value is in how you connect it to memory and specialized accelerators." — Source: [TechTechPotato]
  6. On legacy baggage: "The volume of legacy instructions that x86 must support makes designing clean, highly efficient decoders incredibly difficult." — Source: [AnandTech Interview]
  7. On hardware flexibility: "An open architecture like RISC-V provides the freedom to strip out unnecessary components, resulting in a leaner, more power-efficient design." — Source: [FII Priority Summit]
  8. On standardizing interconnects: "The chiplet ecosystem needs open, standardized die-to-die interconnects to thrive, avoiding a fragmented landscape of proprietary links." — Source: [Tenstorrent TT-Deploy Keynote]
  9. On custom silicon: "Companies will increasingly assemble custom silicon solutions using standardized, interoperable blocks instead of buying off-the-shelf processors." — Source: [Lex Fridman Podcast #162]

Part 8: Risk, Chaos, and Innovation

  1. On taking arrows: "Being on the bleeding edge of a new technology paradigm means enduring failure before the rest of the industry catches up." — Source: [More Than Moore]
  2. On the necessity of risk: "Engineering requires taking risks that have a genuine chance of failing. Doing anything less is simply maintenance." — Source: [Creative Destruction Lab]
  3. On the nature of breakthroughs: "True innovation rarely looks like a clean progression. It usually looks like a chaotic, high-stress scramble to solve impossible problems." — Source: [Lex Fridman Podcast #70]
  4. On first principles: "Every time you start a new project, you have to brutally interrogate your own assumptions and tear down your established first principles." — Source: [AnandTech Interview]
  5. On the illusion of safety: "Sticking to a safe, proven architecture in a rapidly changing industry is the most dangerous decision a company can make." — Source: [EE Times]
  6. On learning from failure: "A failed silicon tape-out is devastatingly expensive, but it provides a density of learning that you cannot get from a hundred successful simulations." — Source: [TechTechPotato]
  7. On the pace of innovation: "The technology industry does not reward companies that pace themselves. It rewards companies that sprint relentlessly toward the next horizon." — Source: [FII Priority Summit]
  8. On embracing the unknown: "The most exciting phase of chip design is the beginning, when you have a blank whiteboard and absolutely no idea how you will meet the performance targets." — Source: [Tenstorrent Keynote]
  9. On relentless drive: "The desire to build something vastly better than what currently exists is a fundamental human drive that powers the entire technology sector." — Source: [Lex Fridman Podcast #162]