
Lessons from Werner Vogels
As Amazon's Chief Technology Officer, Werner Vogels drove the company's move into cloud computing and defined its rules for building distributed systems at scale. This profile uses his talks and internal frameworks to explain exactly how Amazon writes software and structures its engineering teams.
Part 1: Embracing System Failures
- On Inevitability: "Everything fails, all the time." — Source: [All Things Distributed]
- On Preparedness: "Plan for failure, and nothing will fail." — Source: [AWS re:Invent Keynote]
- On Blast Radius: "Our goal is to use reliability, scalability, and performance to minimize the impact and reduce the blast radius of those inevitable failures." — Source: [All Things Distributed]
- On Embracing the Unknown: "You have to accept that failures will happen, and you have to design your systems so they can continue to operate even when underlying components fail." — Source: [AWS Architecture Blog]
- On Graceful Degradation: "Systems must be designed to expect, detect, and recover from failures gracefully instead of trying to prevent every possible error." — Source: [All Things Distributed]
- On Partial Failures: "In complex environments, failure is not a matter of 'if' but 'when,' and partial failures are the hardest to debug." — Source: [ACM Queue Interview]
- On Operational Reality: "There is no compression algorithm for experience. The operational maturity AWS has gained over decades comes from living through failures." — Source: [All Things Distributed]
- On Database Lifespans: "Databases fail. The assumption that your data store will always be available is a dangerous foundation to build on." — Source: [AWS re:Invent Keynote]
- On Continuous Verification: "Verification debt is the new technical debt. If you don't verify how your system handles failure in production, you are blind." — Source: [AWS re:Invent 2025 Keynote]
- On Resilient Ecosystems: "By building resilient systems that maintain an acceptable level of service even when unhealthy, you shift the industry standard." — Source: [All Things Distributed]
Part 2: Taming Complexity in Distributed Systems
- On Origins of Complexity: "Complexity grows from simple origins." — Source: [AWS re:Invent 2024 Keynote]
- On Discipline: "Simplicity requires discipline. It takes active effort to keep a system understandable as it scales." — Source: [All Things Distributed]
- On Tesler's Law: "Complexity can neither be created nor destroyed, it can only move somewhere else." — Source: [AWS re:Invent 2024 Keynote]
- On Mental Models: "A service is too big when you can't keep a mental model of it in your head." — Source: [All Things Distributed]
- On Scalability Definitions: "A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added." — Source: [All Things Distributed]
- On Consensus: "If you're concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given." — Source: [All Things Distributed]
- On System Boundaries: "Service-oriented architecture gives you the level of isolation that allows you to build many software components rapidly and independently." — Source: [ACM Queue Interview]
- On Evolvability: "Make evolvability a requirement. If your system cannot change easily, it will eventually become irrelevant." — Source: [AWS re:Invent Keynote]
- On Automation Choices: "Don't ask yourself what to automate; ask what you do not automate." — Source: [AWS re:Invent Keynote]
- On Asynchronous Design: "Building asynchronous, event-driven architectures allows you to decouple components and scale them independent of each other." — Source: [AWS Architecture Blog]
Part 3: The Frugal Architect
- On Cost as a Requirement: "Make cost a non-functional requirement. Companies can fail because they don’t consider cost at every stage." — Source: [The Frugal Architect]
- On Unit Economics: "The math is simple: if costs are higher than your revenue, your business is at risk." — Source: [The Frugal Architect]
- On Aligning Business Value: "Systems that last align cost to business. As builders, we need to consider that revenue is crucial and use that knowledge to inform our choices." — Source: [The Frugal Architect]
- On Sustainable Growth: "Growth at all costs leads to a trail of destruction." — Source: [The Frugal Architect]
- On Architectural Trade-offs: "Architecting is a series of trade-offs. Frugality isn’t just about cutting costs. It’s about understanding how your spending ties directly to value." — Source: [The Frugal Architect]
- On Observability and Cost: "Unobserved systems lead to unknown costs. Without careful observation and measurement, the true costs of operating a system remain invisible." — Source: [The Frugal Architect]
- On Wasteful Habits: "Like a utility meter tucked away in a basement, lack of visibility enables wasteful habits." — Source: [The Frugal Architect]
- On Cost Controls: "Cost-aware architectures implement cost controls. You need to be able to switch some of those components off, and the switch should be in the hands of the business." — Source: [The Frugal Architect]
- On Incremental Optimization: "Cost optimization is an ongoing journey. Small savings can lead to significant cost reductions over time." — Source: [The Frugal Architect]
- On Historical Assumptions: "Unchallenged success leads to assumptions. The most dangerous phrase in the English language is 'we have always done it this way.'" — Source: [The Frugal Architect]
Part 4: Cloud and Serverless Computing
- On the Serverless Spectrum: "Lambda and serverless are not the same thing. Almost everything else in AWS is serverless by nature. Whether it’s SQS, S3, or DynamoDB, they’re all serverless." — Source: [All Things Distributed]
- On Business Logic: "With serverless technologies, customers can build their business logic into their applications and not have to worry about maintaining anything else." — Source: [AWS re:Invent Keynote]
- On Paying for Value: "In a serverless model, you pay for what you use, and you pay for direct value." — Source: [AWS Architecture Blog]
- On the Compute Continuum: "Compute lives on a continuum. You start off with the VMs, with the containers, with your functions." — Source: [AWS re:Invent Keynote]
- On Environmental Impact: "Sustainability is a freight train that is coming your way that you cannot escape, and cost is a pretty good approximation for the amount of resources that you’ve used." — Source: [AWS Blog]
- On The Lost Art of Efficiency: "As the speed of execution becomes more important, we kind of lost this art, this idea, of architecting for cost and resource efficiency." — Source: [The Frugal Architect]
- On Undifferentiated Heavy Lifting: "There is lots of innovation you can do. You no longer need to think about the infrastructure." — Source: [All Things Distributed]
- On Cloud-Native Databases: "Cloud-native databases are the foundation of innovation. They remove the traditional constraints of storage and compute." — Source: [AWS re:Invent Keynote]
- On the Pace of the Cloud: "If we do not continue to move fast and innovate really fast, we will be out of business in a decade." — Source: [All Things Distributed]
Part 5: Organizational Design and Autonomy
- On Two-Pizza Teams: "If you have more than ten people in your team, you will need to have meetings. Keep teams small to reduce communication overhead." — Source: [ACM Queue Interview]
- On Autonomy: "You should focus on fully delegating ownership to the level of two-pizza teams, letting that team truly own what they deliver." — Source: [All Things Distributed]
- On Job Satisfaction: "Letting the team own their outcomes allows them to love their work, while you support them and push them to produce results." — Source: [All Things Distributed]
- On Conway's Law: "In the fine-grained services approach that we use at Amazon, services do not only represent a software structure but also the organizational structure." — Source: [All Things Distributed]
- On Innovation Velocity: "The strong ownership model, combined with small team size, is intended to make it very easy to innovate without bureaucratic bottlenecks." — Source: [All Things Distributed]
- On Team Fitness Functions: "Teams should be measured by a fitness function: a single key business metric that the team owns and is responsible for improving." — Source: [All Things Distributed]
- On Decoupling: "Decoupled architectures require decoupled organizations. If your software is modular, your human teams must be modular too." — Source: [AWS Architecture Blog]
- On Building for the Business: "We are not just building technology for technology’s sake, we are building technology to support our businesses." — Source: [The Frugal Architect]
- On Minimizing Coordination: "Coordination is expensive. If teams have to coordinate constantly, you have drawn the service boundaries in the wrong place." — Source: [All Things Distributed]
Part 6: Working Backwards and Customer Focus
- On the Starting Point: "Start with your customer and work your way backwards." — Source: [All Things Distributed]
- On the PR/FAQ: "Writing a Press Release and FAQ before a single line of code is written forces clarity about the customer value." — Source: [All Things Distributed]
- On Desired Experiences: "Working backwards from the desired customer experience ensures that the team is solving a real problem rather than just building a feature." — Source: [All Things Distributed]
- On Feedback Loops: "95% of AWS services and features are based directly on customer feedback." — Source: [AWS re:Invent Keynote]
- On Solving Real Problems: "Don't build technology in search of a problem. Go solve real problems. Go help the world." — Source: [AWS re:Invent 2024 Keynote]
- On Intentional Design: "If you cannot articulate the value in a mock press release, you should not be building the product yet." — Source: [All Things Distributed]
- On Eliminating Friction: "Customers want their problems solved faster, cheaper, and with less friction. Architecture should serve those three demands." — Source: [AWS Architecture Blog]
- On Long-Term Thinking: "We build systems that are expected to last. To do that, you must align your technical decisions with long-term customer needs." — Source: [All Things Distributed]
- On the Builders' Mandate: "Now, go build!" — Source: [AWS re:Invent Keynote]
Part 7: Operations and Ownership
- On End-to-End Responsibility: "You build it, you run it." — Source: [ACM Queue Interview]
- On Aligning Incentives: "When the developers who write the code are also responsible for its operation, it naturally aligns incentives and shortens feedback loops." — Source: [ACM Queue Interview]
- On Operational Health: "You cannot manage a system effectively if you do not understand its operational realities in production." — Source: [All Things Distributed]
- On Software Evolution: "Software becomes irrelevant if we do not evolve it. Maintaining software is a continuous process of adaptation." — Source: [AWS re:Invent 2024 Keynote]
- On the Developer Role: "The role of the developer does not end at deployment; it begins there." — Source: [All Things Distributed]
- On Incident Response: "How a team handles an outage tells you more about their engineering culture than how they write code." — Source: [All Things Distributed]
- On Security Responsibility: "Security is everyone's job now, not just the security team's." — Source: [AWS re:Invent Keynote]
- On Data Gravity: "I love that the world is data-intensive… unfortunately, it is called Big Data, which obscures the operational reality of managing it." — Source: [All Things Distributed]
- On Operational Metrics: "If you cannot measure it, you cannot operate it. Instrumentation is not an afterthought; it is a core feature." — Source: [AWS Architecture Blog]
Part 8: The Future, AI, and Continuous Innovation
- On the Threat of AI: "Will AI take my job? Maybe. Will AI make me obsolete? Absolutely not if you evolve." — Source: [AWS re:Invent 2025 Keynote]
- On AI Tools: "Vibe coding is fine, but only if you pay close attention to what is being built. The work is yours, not the tools." — Source: [AWS re:Invent 2025 Keynote]
- On the Necessity of Failure: "Real learning only happens when you are engaged enough to fail." — Source: [AWS re:Invent 2025 Keynote]
- On Predicting the Future: "The best way to predict the future is to build it." — Source: [All Things Distributed]
- On Human-Computer Interaction: "Voice is the first truly natural interface. As we reduce friction, computing disappears into the background." — Source: [AWS re:Invent 2017 Keynote]
- On Emerging Tech: "Experimentation at this moment is important. When it comes to quantum computing and AI, try to figure out what this will mean for your company or for your business." — Source: [All Things Distributed]
- On Tool Abstraction: "AI can generate code faster than humans can comprehend it. This means your job shifts from writing lines of code to deeply understanding architecture and business constraints." — Source: [AWS re:Invent 2025 Keynote]
- On the Renaissance Developer: "Developers must become Renaissance builders, stepping back from syntax to master systems thinking, cost, and security." — Source: [AWS re:Invent 2025 Keynote]
- On Moving Forward: "Complexity, when approached thoughtfully, can drive innovation. Embrace it, learn from it, and keep building." — Source: [All Things Distributed]