Building AI Products Series #4: Designing Around Uncertainty and Failure

Traditional software can fail, but it usually fails in familiar ways: a bug, an outage, a missing permission, a validation error.

AI products fail differently. They can be fluent and wrong. They can be partially right. They can be right yesterday and worse tomorrow. They can answer beyond their evidence. They can change behavior when a vendor updates a model underneath you.

That does not make AI unusable. It means uncertainty is a design material.

Design from failure modes outward

Most teams design the happy path first. For AI products, that is dangerous.

Start by listing how the model can fail.

It may hallucinate facts. It may omit important context. It may misclassify edge cases. It may produce an answer that sounds more certain than the evidence allows. It may use stale data. It may expose information the user should not see. It may be too slow. It may cost too much. It may perform differently after a model update.

Each failure mode needs a product response.

For an AI support assistant, the inventory might be concrete:

cites the wrong help article
misses a customer's paid plan or region-specific policy
drafts a refund promise the company cannot honor
exposes internal-only notes to the customer-facing reply
sounds certain when the case needs human approval
keeps suggesting the same bad macro after agents reject it
gets slower or more expensive during a ticket spike

This list is not UX polish. It is the design input for constraints, review rules, eval cases, escalation thresholds, and rollout limits.

Some failures require prevention: permissions, retrieval constraints, structured outputs, blocked actions. Some require detection: evals, confidence thresholds, anomaly monitoring, human review. Some require recovery: edit, undo, escalate, regenerate, or route to support.

Do not make uncertainty look like certainty

Bad AI UX presents every answer with the same confidence.

A sourced answer from approved data, a rough inference from incomplete context, and a speculative suggestion should not look identical.

Trust calibration starts with honest presentation. The product should not pretend certainty it does not have.

This does not mean dumping probability scores onto users. Most users do not want a decimal. They want to know whether the product is giving them a verified result, a best-effort draft, a recommendation, or a guess that needs review.

Use language and interface states that match the strength of the answer.

Artifact: uncertainty UX patterns

`text

Uncertainty UX Patterns

Evidence-backed answer

Use when the model can cite approved sources or structured data.

Interface pattern: answer + source links + timestamp.

Language: "Based on these records..."

Draft for review

Use when the model creates content the user must approve.

Interface pattern: editable draft + accept/edit/reject controls.

Language: "Draft suggested reply. Review before sending."

Low-confidence classification

Use when the model is choosing among categories with weak signal.

Interface pattern: suggested label + alternatives + quick correction.

Language: "Best match: Billing. Other possible matches: Refund, Account Access."

Missing-context state

Use when the model lacks required information.

Interface pattern: ask for specific missing input or show disabled action.

Language: "I need the contract effective date before I can compare renewal terms."

Escalation required

Use when the risk exceeds the product's autonomy.

Interface pattern: route to human, queue, or approval flow.

Language: "This case requires review before action."

Silent assist

Use when AI improves the workflow without needing user attention.

Interface pattern: improved ordering, prefilled fields, deduped records.

Language: no AI label unless transparency is required.

The point is not to make the UI nervous. The point is to make the product honest.

Failure should be recoverable

Users will forgive imperfect AI if they can recover quickly.

They will not forgive a product that confidently damages their work and gives them no way back.

Correction controls are not optional. Users need to edit, reject, override, teach, undo, and recover.

For example, an AI scheduling assistant should show proposed changes before sending calendar invites. A finance explanation tool should let users inspect the underlying metrics. A code assistant should show diffs. A customer support assistant should let agents edit before sending and flag bad suggestions.

Recovery is part of trust.

Model drift is a product risk

With AI products, quality can shift without your team changing your code.

A vendor may update a model. Retrieval data may change. User behavior may change. A prompt that worked for one distribution of inputs may degrade as usage expands. A fine-tuned model may age as policy or product details change.

This is not just an ML concern. It is product quality changing underneath users.

Teams need monitoring, regression sets, release gates for model changes, and a clear owner when quality drops.

Design autonomy in levels

Do not make every AI feature either manual or fully autonomous.

Use levels.

Level 1: suggest. The product proposes; the user acts.

Level 2: draft. The product creates work; the user approves.

Level 3: execute with review. The product takes action after explicit confirmation.

Level 4: execute within guardrails. The product acts automatically for low-risk cases and escalates exceptions.

Level 5: autonomous operation. The product acts continuously with monitoring, permissions, and rollback.

Most products should earn higher autonomy slowly. Autonomy is not a launch setting. It is a trust contract.

The practical standard

AI products should be designed as if the model will sometimes be wrong, slow, expensive, stale, or overconfident.

Because it will.

The product team's job is not to eliminate all uncertainty. It is to place uncertainty where users, systems, and operators can handle it.