The AI Control Plane Series #4: Model Routing and Capability Tiers

Most companies start with a model choice. They should end with a routing policy.

A single default model is simple. It is also lazy once AI usage spreads. Some work needs the best reasoning model available. Some work needs low latency. Some work needs cheap classification. Some work needs strict data handling. Some work should never leave a particular vendor boundary. Some work can tolerate rough drafts. Some work needs near-perfect recall or careful tool use.

If every task goes to the same model, the company overpays for easy work and under-controls risky work.

Model routing is the control-plane answer. It turns model choice into a runtime decision based on task type, risk, context, budget, latency, privacy, and required quality.

This is not a product-management debate about which model is coolest this month. Operators need a practical tier system.

One tier handles cheap, repetitive work: classification, extraction, formatting, simple summaries, routing, deduplication, basic drafts, and internal transformations. Another tier handles judgment-heavy work: analysis, planning, synthesis, edge cases, strategy, complex coding, legal-adjacent review, security-sensitive triage, or high-value customer decisions. A third tier may be specialized: code, vision, speech, structured extraction, on-prem data, regulated workloads, or long-context retrieval.

The routing rule should not be "use cheap unless someone complains." That creates silent quality debt. It should define when quality matters enough to pay.

A customer renewal risk summary for the top fifty accounts may deserve a stronger model than a first-pass support tag. A contract clause extraction task may deserve a specialized workflow and tight validation. A batch of product-review sentiment labels may be fine on a cheap model if sampling shows accuracy is stable.

Routing also needs downgrade and fallback behavior. What happens when the preferred model is unavailable, rate-limited, too slow, or too expensive for the remaining budget? Does the system retry, route to a cheaper model, wait, ask a human, or fail closed? Too many AI workflows make this decision accidentally in code.

Capability tiers should be visible to operators. They need to know which workflows are consuming premium models, why, and with what quality result. Otherwise routing becomes another hidden engineering detail until the bill arrives or quality drops.

The best routing policies combine four signals.

First, task class. Is this extraction, generation, reasoning, code, ranking, search, or tool planning?

Second, risk. Could a bad output affect customers, money, security, compliance, production systems, strategy, or executive decisions?

Third, quality bar. Does the output need to be accepted as-is, reviewed, sampled, or treated as a rough input?

Fourth, economics. What is the acceptable cost and latency for this workflow, customer, feature, or team?

The control plane should make those signals explicit. A workflow can declare: "This is customer-visible, high-value, human-reviewed, latency-tolerant, and budgeted for premium reasoning." Another can declare: "This is internal classification, low-risk, sampled weekly, and cost-sensitive." The runtime can then route intelligently instead of relying on a hardcoded model name from six months ago.

Routing policy also protects teams from model churn. The model market changes constantly. New models get faster, cheaper, better, stranger. If every app embeds model choice directly, each upgrade becomes a scavenger hunt. A control plane lets the company change routing rules in one place, run evals, compare outputs, and roll forward or back without rewriting every workflow.

There is a human side too. Model routing creates a shared language between finance, security, product, engineering, and operations. Finance can see where premium intelligence is justified. Security can see which data leaves which boundary. Product can protect customer-facing quality. Operators can ask whether a workflow is overpaying or underpowered.

The routing layer should not become a committee. It should be a policy engine with evidence.

Use the cheap model when cheap is enough. Use the strong model when quality, risk, or leverage warrants it. Use specialized models where specialization actually helps. Revisit routing when evals, costs, latency, or business value change.

The point is simple: model choice is too important to leave scattered across scripts and product settings.

Once AI becomes operational infrastructure, models become resources to route, not mascots to cheer for.

This is part 4 of 10 in The AI Control Plane.

Previous: Tool Access and Action Boundaries
Next: Budgets and Usage Controls
View the full series index

The AI Control Plane Series #4: Model Routing and Capability Tiers

Written by Antoine Buteau

From Productivity to Throughput Series #10: The AI Throughput Audit

The AI Control Plane Series #3: Tool Access and Action Boundaries