Automation series #4: The Automation Boundary: Code vs Model vs Human

Every automation has a boundary problem.

What should code decide? What should a model decide? What should a human decide?

If you do not answer that explicitly, the boundary gets decided accidentally. Usually by whoever wrote the first prompt, connected the first API, or approved the first demo.

That is how teams end up with workflows that are impressive in a sandbox and frightening in production.

The boundary map

Before building, draw a simple map:

|---|---:|---:|---:|---|

The map is not bureaucracy. It is how you prevent responsibility from disappearing. Revisit it after incidents, policy changes, major model changes, or new external actions.

Code owns exactness

Code should own anything that must be exact:

required fields
enum values
permission checks
rate limits
policy thresholds
idempotency keys
dedupe logic
retries
durable state
external writes

If a rule is important and expressible, do not outsource it to a prompt.

A model can suggest that a refund seems appropriate. Code should still check whether the amount is allowed, whether the user has permission, whether the ticket was already processed, and whether the action is reversible.

Models own ambiguity

Models are useful when exact rules are too brittle because the input is language-heavy or judgment-heavy.

A model can decide whether a customer is angry, whether a ticket is about billing, whether a contract clause resembles a termination right, or whether a call summary contains a buying signal.

But the model should work inside constraints:

fixed input contract
fixed output schema
allowed categories
confidence score
rationale
review flag
prompt and model version logging

The model can reason. It should not be allowed to silently invent new workflow states.

Humans own accountability

Humans belong where accountability matters.

That does not mean humans should click approve on everything. That is lazy design. It means humans should be inserted where the system needs judgment, risk ownership, or exception handling.

Good human boundaries include:

low-confidence model outputs
irreversible actions
sensitive customer communication
legal, security, finance, or HR issues
policy exceptions
novel cases not covered by the taxonomy
sampled review for quality control

Human review is not the opposite of automation. It is a control surface.

Use risk and reversibility

The simplest way to place the boundary is to score risk and reversibility.

| Risk | Reversibility | Suggested automation |

|---|---|---|

| Low | Easy to reverse | Automate with logging |

| Low | Hard to reverse | Add confirmation or delayed action |

| Medium | Easy to reverse | Automate with confidence gate and sampled review |

| Medium | Hard to reverse | Human approval or staged rollout |

| High | Easy to reverse | Human review plus audit trail |

| High | Hard to reverse | Human owns decision; automation assists only |

A wrong tag in a CRM is annoying. A wrong payroll change is serious. A wrong internal draft is recoverable. A wrong email to a regulator is not.

The architecture should reflect that.

Example: renewal risk workflow

A customer success team wants AI to identify renewal risk from call notes and support history.

Boundary design:

Code gathers account data, open tickets, renewal date, ARR band, and meeting notes.
Model summarizes risk signals and classifies renewal risk as low, medium, or high.
Code checks whether the account is strategic, whether renewal is inside 90 days, and whether open escalations exist.
Low-risk accounts get a CRM note automatically.
Medium-risk accounts go to CSM review.
High-risk strategic accounts create an escalation task for the account owner and manager.
No customer-facing communication is sent without human approval.

The model identifies signals. Code enforces routing. Humans own the customer strategy.

The operator's rule

Draw the boundary before you build the workflow.

If you cannot say which decisions belong to code, model, and human, you are not designing automation. You are distributing risk randomly.