The lazy debate is whether humans should be in the loop. Back-office work needs a better vocabulary. Some decisions deserve pre-approval. Some deserve post-review. Some deserve sampling. Some can run under thresholds. Some should stop at a recommendation. Some should never be delegated because the consequence is too sensitive or too hard to reverse.
Delegation depth is the design variable. A finance agent may draft an expense decision, approve low-risk claims under policy, route exceptions, and require pre-approval above a threshold. A procurement agent may prepare vendor intake but stop before spend commitment. A legal agent may suggest fallback language but require lawyer review before external redlines. A people-ops agent may answer simple policy questions but escalate anything sensitive.
The first dimension is reversibility. If the action can be undone cleanly, more delegation may be reasonable. If the action creates a legal obligation, employee consequence, external commitment, access exposure, payment, or public statement, the gate should be stronger. Reversibility is not just technical. A bad HR answer may be technically retractable but socially costly.
The second dimension is materiality. Dollar amount, contract value, compensation impact, data sensitivity, customer exposure, vendor criticality, employee impact, and regulatory risk all change approval design. A $40 reimbursement and a $400,000 vendor renewal should not share the same loop. A routine NDA and a non-standard liability clause should not share the same gate.
The third dimension is confidence. If the evidence is complete and the policy match is clean, the system can move further. If documents conflict, source data is stale, the policy is ambiguous, or the model is uncertain, the workflow should stop earlier. Confidence should affect the path, not merely decorate the answer.
The fourth dimension is reviewer capacity. A design that sends everything to humans will fail quietly. Review queues become the new bottleneck. Reviewers click approve because the queue is too large. The company then claims it has human oversight while actually running rubber-stamp automation. Good approval design measures queue age, reviewer load, rejection reasons, edit distance, and downstream errors.
Sampling is underrated. Some low-risk workflows can run automatically with random or risk-based review. That gives the system room to operate while still producing quality signals. Sampling should be explicit: what percentage, which risk triggers, who reviews, what happens when defects appear, and how the workflow learns.
Previews are useful when action risk is high but preparation value is clear. The agent assembles the decision packet, drafts the update, shows the proposed tool call, and waits. This is often the right first state for legal, finance, procurement, and HR workflows. It lets the team gain leverage without pretending the system is ready for autonomy.
Post-review works when errors are cheap and visible. A routine classification, non-sensitive task update, or low-risk reminder may not need a human before action. But post-review still requires a defect process. If reviewers find repeated mistakes, the workflow should tighten gates, improve instructions, update policies, or narrow scope.
Delegation should be earned over time. A loop can begin at draft-only, move to recommendation, then pre-approved execution, then threshold-based execution, then sampling. Each increase should be tied to observed reliability, not enthusiasm. Trust escalation is an operating decision.
A useful approval matrix has rows for workflow types and columns for action depth: prepare, recommend, draft, execute with pre-approval, execute with post-review, execute under threshold, execute autonomously. Each cell should list risk triggers, required evidence, reviewer, log requirement, and rollback path.
The point is not to slow everything down. The point is to put human judgment where it matters. Back-office teams already suffer from vague approvals and unnecessary gates. Agentic workflows give companies a chance to redesign approval instead of automating the old mess.
Teams should also design rejection. What does the system learn when a reviewer says no? Was the evidence incomplete, the policy wrong, the source stale, the risk category too low, or the recommendation poorly written? If every rejection becomes a private human correction, the workflow will not improve.
The useful review queue is not just an approve button. It shows the reason for the recommendation, the evidence used, the action that will happen, and the consequence of approval. That makes oversight real instead of ceremonial.
The approval matrix should be readable by operators, not only system admins. A finance owner, legal reviewer, procurement lead, HR partner, or compliance manager should understand why a task stopped and what authority is being requested. If the logic is buried in automation rules, people will route around it.
Delegation depth also needs a review cadence. A workflow that was safe at low volume may become risky when adoption grows. A policy change may require tighter gates. A cleaner evidence base may allow looser gates. The approval model should move with observed reality.
Evidence note: NIST's AI Risk Management Framework is a useful reference for risk-based governance, while Google Cloud's MLOps operationalization material helps frame monitoring and operational controls using https://www.nist.gov/itl/ai-risk-management-framework and https://cloud.google.com/architecture/ai-ml/operationalize-mlops.
This is part 8 of 10 in Agentic Back Office.