In any system, one constraint determines throughput.
In a factory, it's the machine with the longest queue. In a team, it's the person or process that everything waits on. In an organization, it's the approval gate that every critical project queues behind.
The failure pattern is always the same: everyone optimizes around the bottleneck instead of fixing it. The non-bottleneck resources get more efficient. The bottleneck gets more constrained. The system becomes more dependent on it. Throughput doesn't improve — it degrades.
This is the Theory of Constraints in plain language, and it's one of the most consistently ignored principles in organizational design.
The Compounding Problem
Here's what happens when you optimize everything except the true constraint: you make the constraint worse.
Imagine a team where one senior engineer's review sign-off is on every critical feature. The team responds by improving code quality, adding automated tests, reducing the number of review cycles. These are all good things. But they're also responses that increase the dependency on that senior engineer — more work goes through them, not less.
Now the bottleneck is tighter than before. The team's velocity is limited by how fast that one person can review. The improvements around them have increased the pressure on the one resource they can't scale.
This is the paradox of local optimization around a constraint: the improvements make the constraint more expensive to fix, because the system's dependence on it has grown.
When You Can't Fix the Constraint Immediately
Sometimes the constraint can't be removed quickly — the person with the knowledge can't be replaced, the approval gate is a regulatory requirement, the technology doesn't exist yet.
In that case, the goal is to protect the constraint from unnecessary load. Every piece of work that goes through the bottleneck should be as ready as possible when it arrives — no rework, no missing context, no repeated reviews. Reduce the non-value-added touches so the constraint's limited capacity is used only on work that moves forward.
This is the "drum-buffer-rope" concept from the Theory of Constraints: the constraint sets the pace (the drum), everything upstream is queued and released just in time (the rope), with a buffer to absorb variability and protect the constraint from starve.
Concrete example: an engineering team where one principal engineer is the mandatory reviewer on all critical infrastructure changes. You can't clone them. The review requirement isn't going away. The team's response is to run a "review readiness" process — no infrastructure PR goes up for review until it has two peer approvals, a test run in staging, and a one-paragraph summary of the change and rollback plan. The principal's review time drops from 45 minutes per PR (because they're catching basics, answering questions, chasing down context) to 15 minutes (the PR is ready, the question is answered, the decision is clear). The constraint — the principal's review capacity — produces more throughput without being replaced. The team's own work to get ready is what freed it up.
The Goldratt reference earns its weight here: you cannot improve a non-bottleneck. But you can reduce the load on the bottleneck without replacing it — and that is often the highest-leverage action available while a structural fix is being arranged.
