Building AI Products Series #10: The Building AI Products Audit

AI products need a harder readiness test than "the demo worked."

A demo proves possibility. An audit tests whether the product can create repeatable value in the real world: with messy users, imperfect data, latency, cost, permissions, uncertainty, support tickets, enterprise buyers, and model drift.

Use this audit before shipping, scaling, buying, or fixing an AI product.

1. Problem selection

Start here because everything else depends on it.

`text

Problem Selection

What user job does this product improve?
Is the pain frequent, expensive, risky, or strategically important?
Why is AI needed instead of rules, workflow design, templates, search, or humans?
What is the non-AI baseline?
What does success look like for the user?
What would make this feature not worth building?

If the problem is vague, the AI will make it vaguer.

2. Workflow fit

AI should fit an existing workflow or create a meaningfully better one.

`text

Workflow-Fit Map

Before AI:

Trigger:
User goal:
Inputs:
Steps:
Tools involved:
Output:
Pain points:

With AI:

Where does AI intervene?
What context is gathered automatically?
What does the user review?
What action happens next?
What is faster, better, cheaper, or newly possible?
What new risk or friction is introduced?

If the output has nowhere to go, the feature is probably a toy.

3. UX and trust

The product must help users understand, control, and recover from model behavior.

`text

UX and Trust

Does the product distinguish verified answers, drafts, suggestions, and guesses?
Are sources, assumptions, or affected fields visible when needed?
Can users edit, reject, override, teach, undo, and recover?
Does the product avoid overstating certainty?
Does the feature need to be visible AI, or should it be invisible assistance?
Are high-risk actions gated by review or approval?

Trust is built through control, not branding.

4. Evals and release gates

No evals, no serious AI product.

`text

Evals

What are the quality dimensions?
What is the minimum bar for release?
Is there a representative gold set?
Are edge cases and known failures included?
Are regression tests run for model, prompt, retrieval, or policy changes?
Where is human review required?
Who owns evals after launch?

If quality is judged by vibes, the roadmap is guessing.

5. Model and architecture choice

The model stack should match the product promise.

`text

Model Strategy

Why this model or model mix?
Where are rules better than AI?
Where is retrieval required?
Where is a fine-tune justified?
Where is human review required?
What are latency and cost limits?
What is the fallback if the model or vendor changes behavior?

Model choice is a product decision because it shapes UX, economics, privacy, and durability.

6. Data rights and learning loop

Data strategy must be explicit.

`text

Data and Learning

What data is used for inference?
What data is stored?
What data can be used for evals or improvement?
What requires consent or contractual permission?
How do users or admins delete data?
What feedback signal improves the product?
Is there a real learning loop or just a slide claiming a flywheel?

Data without rights and learning process is not a moat.

7. Economics and performance

The feature must work as a business.

`text

Economics and Performance

What is cost per invocation?
What is cost per successful outcome?
What is the human review cost?
What latency is acceptable for this workflow?
What happens at 10x usage?
Which limits, pricing, or packaging choices does cost require?

If the unit economics only work in a prototype, the product is not ready to scale.

8. Enterprise readiness

Enterprise readiness is not a sales deck. It is product surface area.

`text

Enterprise-Readiness Checklist

Role-based permissions for who can invoke AI, approve outputs, and change settings
Workspace or tenant-level admin controls to enable, disable, or limit AI features
Data retention settings with deletion behavior documented in-product
Training-data opt-out where applicable
Audit logs for AI actions, generated outputs, admin setting changes, and approvals
Source restrictions for retrieval, including approved collections and blocked sources
Explainability or evidence appropriate to risk, such as citations, affected fields, or decision rationale
Safety policies for prohibited actions, sensitive data, and high-risk recommendations
Compliance posture documented for security, privacy, vendor processing, and data residency
Human approval controls for sensitive actions
Incident and rollback process with customer communication rules

Buyers need to know the system can be governed from the product, not only from a security questionnaire.

9. Operations and support

AI products require operating models.

`text

Operations and Support

What are the common failure modes?
How are AI-related support tickets categorized?
Can support inspect enough context to help safely?
What gets escalated to product, engineering, ML, or ops?
How are model drift and vendor drift monitored?
What is the rollback path?
Who communicates meaningful behavior changes to customers?

If nobody owns failures after launch, users will.

10. Competitive durability

If a competitor can call the same model API, what makes this product durable?

`text

Durability

Workflow ownership
Proprietary or permissioned data loops
Distribution advantage
Deep integrations
Strong evals and domain quality bars
Trust and governance controls
Operational learning from support and usage
Customer-specific configuration or context
Switching costs created by real value, not lock-in games

The moat is rarely the model alone.

Product audit worksheet

Use this as the summary page.

Optional scoring: mark each section Green = 2, Yellow = 1, Red = 0. Weight UX/trust, evals, data rights, and enterprise readiness double for high-risk or enterprise products. A low-risk internal drafting tool might ship at 14/20 with no red sections; a customer-facing finance explainer should require 24/28, no red sections, and explicit approval from product, security, and support.

`text

Building AI Products Audit Worksheet

Product / feature:

Owner:

Date:

Problem selection: Green / Yellow / Red

Notes:

Workflow fit: Green / Yellow / Red

Notes:

UX and trust: Green / Yellow / Red

Notes:

Evals and release gates: Green / Yellow / Red

Notes:

Model strategy: Green / Yellow / Red

Notes:

Data rights and learning loop: Green / Yellow / Red

Notes:

Economics and performance: Green / Yellow / Red

Notes:

Enterprise readiness: Green / Yellow / Red

Notes:

Operations and support: Green / Yellow / Red

Notes:

Competitive durability: Green / Yellow / Red

Notes:

Score:

Raw score:
Weighted score, if used:
Red sections that block launch:

Decision:

Ship
Ship behind limited rollout
Prototype only
Fix before launch
Do not build

Top three risks:

Next three actions:

The practical standard

A serious AI product is not measured by how impressive it is in a controlled demo.

It is measured by whether it creates value repeatedly, handles uncertainty honestly, respects data rights, passes quality gates, fits real workflows, and can be operated after launch.

That is the bar.

Everything else is theater.

Building AI Products Series #10: The Building AI Products Audit

1. Problem selection

2. Workflow fit

3. UX and trust

4. Evals and release gates

5. Model and architecture choice

6. Data rights and learning loop

7. Economics and performance

8. Enterprise readiness

9. Operations and support

10. Competitive durability

Product audit worksheet

The practical standard

Written by Antoine Buteau

Goals Metrics Dashboards and Reporting Series #1: Why Your Goals Keep Failing (And What Actually Works)

Building AI Products Series #9: AI Product Metrics, Economics, and Support Burden