The bottleneck moves

Agents move the bottleneck from creation to evaluation.

That does not mean the operator should read every generated word slowly. It means the operator needs a review system that finds the places where judgment matters: truth, fit, tradeoffs, risks, taste, and decision readiness.

In the agent era, review is not cleanup. Review is the leverage point.

Review before polish

The wrong review sequence starts with grammar, formatting, and tone. That creates review theater: polished errors, clean hallucinations, tidy but useless artifacts.

Use this sequence instead:

  1. Source fidelity: Which claims are grounded? Which claims need evidence?
  2. Outcome fit: Does the output answer the actual brief or merely the literal instruction?
  3. Assumptions: What did the agent assume without saying so?
  4. Contradictions: Does it conflict with source material, prior decisions, or itself?
  5. Tradeoffs: What cost, risk, or second-order effect is missing?
  6. Usefulness: Can the intended reader, user, or decision-maker act on it?
  7. Boundary check: Did the agent attempt anything outside scope?
  8. Decision: accept, revise, split, escalate, discard, or archive.
  9. Polish: only now fix language.

The order matters. If you polish first, you become attached to material that may not deserve to survive.

Review protocol

For any meaningful agent output, ask:

  • What claim would embarrass us if it were false?
  • What did the agent avoid deciding?
  • What changed from the brief?
  • What would break if this shipped?
  • What evidence is missing?
  • What is too generic to be useful?
  • What is the smallest acceptable version?
  • What should be escalated to a human before action?

The goal is not to punish the agent. The goal is to keep reality contact.

Review artifacts

Good operators leave review traces:

  • accepted/rejected decision;
  • material changes requested;
  • claims requiring evidence;
  • known residual risks;
  • source gaps;
  • next action;
  • memory update if the failure is likely to recur.

This matters because agent workflows compound. If the same review failure happens three times and nothing in the system changes, the operator is not reviewing. They are repeatedly rescuing.

Failure mode

The classic failure is checking style while outsourcing judgment.

The modern version is worse: the operator skims a polished artifact, feels the comfort of completion, and ships work they would never have accepted if a junior human had produced it.

Review is where the super IC earns the leverage they borrowed from the machine.