Automation series #10: The Automation Architecture Worksheet

The point of automation architecture is not to produce diagrams.

The point is to make better decisions before the workflow is in production, touching customers, writing to systems, and creating trust problems.

Use this worksheet before building or expanding an AI automation workflow. It is deliberately practical. If a team cannot answer these questions, the workflow is not ready.

1. Define the workflow

What is the workflow called?

What business outcome should it produce?

What starts it?

What is the source system?

What is the final desired state?

Who owns the workflow after launch?

What is explicitly out of scope?

If the answer is "the AI handles it," stop. That is not a workflow definition.

2. Classify the work

Break the workflow into steps.

For each step, mark the best owner:

|---|---:|---:|---:|---|

| Receive event | Yes | No | No | Deterministic |

| Validate input | Yes | No | No | Schema |

This is the automation boundary map.

3. Identify risk and reversibility

For each action, ask:

What can go wrong?
Who is affected?
Is the action reversible?
How quickly would we detect a mistake?
What is the worst plausible outcome?
Does this require approval?

Use this table:

|---|---|---:|---|

| Internal note | Low | Yes | Auto |

4. Define model jobs narrowly

For every AI step, write:

job type: classify, extract, draft, summarize, route, recommend
input contract
output schema
allowed categories
confidence expectations
rationale requirement
review triggers
prompt version
model version
evaluation set

Example:

`json

{

"job": "classify_support_ticket",

"allowed_categories": ["billing", "product", "bug", "legal", "security", "unknown"],

"output_schema": {

"category": "string",

"confidence": "number",

"rationale": "string",

"requires_review": "boolean"

"review_triggers": ["confidence < 0.85", "category in legal/security", "unknown"]

}

If the model job cannot be described this way, it is probably too broad.

Also define cost and latency expectations:

expected monthly volume
maximum acceptable model cost per item or per month
latency budget for normal and peak periods
fallback behavior if the model provider is slow or unavailable
when deterministic pre-filters should skip the model call

5. Set decision gates

Define what happens after the AI step.

| Condition | Action |

|---|---|

| Output fails schema | Retry once, then exception |

| Confidence high and low risk | Auto proceed |

| Confidence medium | Human review or sampled review |

| Confidence low | Exception queue |

| Sensitive category | Human approval |

| Irreversible action | Human approval or staged action |

Do not collect confidence if the workflow ignores it.

6. Design state and retries

Answer:

What is the workflow ID?
What is the source event ID?
What is the idempotency key?
What states can the workflow enter?
Which errors are retryable?
What is the max retry count?
Where do exceptions go?
Who owns the exception queue?
What is the SLA?

Minimum states:

`text

RECEIVED -> VALIDATED -> AI_COMPLETED -> GATE_DECIDED -> ACTION_COMPLETED

-> HUMAN_REVIEW_PENDING

-> EXCEPTION_PENDING

-> FAILED

The exact names matter less than having explicit states.

7. Define observability

Capture:

workflow ID
event ID
input reference
schema version
prompt version
model version
model output reference
parsed output
confidence score
validation result
gate decision
human review decision
external side effect IDs
retry history
final status

Ask one blunt question: if this fails next month, can we explain what happened in ten minutes?

8. Build evaluation loops

Define:

gold set examples
sampled review rate
failure taxonomy
regression tests
drift checks
review cadence
owner for prompt/policy changes

Evaluation is not a research project. It is maintenance for a production workflow.

9. Define security controls

Confirm:

secrets are excluded from prompts and logs
credentials are scoped to the workflow, tenant, and action
retrieved documents and tool outputs are treated as untrusted input
prompt-injection controls exist for agent/tool workflows
external actions require approval, staging, or rate limits based on risk
logs are redacted where full content is unnecessary

10. Assign ownership after launch

Every automation needs an owner.

Not just a builder. An owner.

Ownership includes:

monitoring dashboards
reviewing exceptions
maintaining policies
approving prompt/model changes
reviewing failure patterns
communicating incidents
naming an incident owner when something breaks
weekly review for new workflows, then monthly review once stable
a retirement trigger, such as sustained low usage, high exception rate, or policy mismatch
retiring or redesigning decayed workflows

If nobody owns it, the workflow will decay silently.

11. Decide the launch shape

Choose one:

shadow mode: automation recommends, humans act
draft mode: automation creates drafts, humans send
assisted mode: automation acts on low-risk cases, reviews the rest
full automation: only for low-risk, well-observed, reversible work

Most serious AI automation should not start at full automation. Earn autonomy.

Final operator check

Before launch, confirm:

deterministic parts are handled by code
AI steps are bounded
humans own accountability points
confidence gates change behavior
state is durable
side effects are idempotent
retries are controlled
logs can reconstruct decisions
security follows least privilege
prompts, logs, and tool calls exclude unnecessary sensitive data
external actions are gated
evaluation loops exist
owner is named

If this feels like too much, that is the signal.

The work is not just making AI do something once.

The work is making the workflow reliable enough that people keep trusting it after the first weird edge case.