AI products need a harder readiness test than "the demo worked."

A demo proves possibility. An audit tests whether the product can create repeatable value in the real world: with messy users, imperfect data, latency, cost, permissions, uncertainty, support tickets, enterprise buyers, and model drift.

Use this audit before shipping, scaling, buying, or fixing an AI product.

1. Problem selection

Start here because everything else depends on it.

`text

Problem Selection

  • What user job does this product improve?
  • Is the pain frequent, expensive, risky, or strategically important?
  • Why is AI needed instead of rules, workflow design, templates, search, or humans?
  • What is the non-AI baseline?
  • What does success look like for the user?
  • What would make this feature not worth building?

`

If the problem is vague, the AI will make it vaguer.

2. Workflow fit

AI should fit an existing workflow or create a meaningfully better one.

`text

Workflow-Fit Map

Before AI:

  • Trigger:
  • User goal:
  • Inputs:
  • Steps:
  • Tools involved:
  • Output:
  • Pain points:

With AI:

  • Where does AI intervene?
  • What context is gathered automatically?
  • What does the user review?
  • What action happens next?
  • What is faster, better, cheaper, or newly possible?
  • What new risk or friction is introduced?

`

If the output has nowhere to go, the feature is probably a toy.

3. UX and trust

The product must help users understand, control, and recover from model behavior.

`text

UX and Trust

  • Does the product distinguish verified answers, drafts, suggestions, and guesses?
  • Are sources, assumptions, or affected fields visible when needed?
  • Can users edit, reject, override, teach, undo, and recover?
  • Does the product avoid overstating certainty?
  • Does the feature need to be visible AI, or should it be invisible assistance?
  • Are high-risk actions gated by review or approval?

`

Trust is built through control, not branding.

4. Evals and release gates

No evals, no serious AI product.

`text

Evals

  • What are the quality dimensions?
  • What is the minimum bar for release?
  • Is there a representative gold set?
  • Are edge cases and known failures included?
  • Are regression tests run for model, prompt, retrieval, or policy changes?
  • Where is human review required?
  • Who owns evals after launch?

`

If quality is judged by vibes, the roadmap is guessing.

5. Model and architecture choice

The model stack should match the product promise.

`text

Model Strategy

  • Why this model or model mix?
  • Where are rules better than AI?
  • Where is retrieval required?
  • Where is a fine-tune justified?
  • Where is human review required?
  • What are latency and cost limits?
  • What is the fallback if the model or vendor changes behavior?

`

Model choice is a product decision because it shapes UX, economics, privacy, and durability.

6. Data rights and learning loop

Data strategy must be explicit.

`text

Data and Learning

  • What data is used for inference?
  • What data is stored?
  • What data can be used for evals or improvement?
  • What requires consent or contractual permission?
  • How do users or admins delete data?
  • What feedback signal improves the product?
  • Is there a real learning loop or just a slide claiming a flywheel?

`

Data without rights and learning process is not a moat.

7. Economics and performance

The feature must work as a business.

`text

Economics and Performance

  • What is cost per invocation?
  • What is cost per successful outcome?
  • What is the human review cost?
  • What latency is acceptable for this workflow?
  • What happens at 10x usage?
  • Which limits, pricing, or packaging choices does cost require?

`

If the unit economics only work in a prototype, the product is not ready to scale.

8. Enterprise readiness

Enterprise readiness is not a sales deck. It is product surface area.

`text

Enterprise-Readiness Checklist

  • Role-based permissions for who can invoke AI, approve outputs, and change settings
  • Workspace or tenant-level admin controls to enable, disable, or limit AI features
  • Data retention settings with deletion behavior documented in-product
  • Training-data opt-out where applicable
  • Audit logs for AI actions, generated outputs, admin setting changes, and approvals
  • Source restrictions for retrieval, including approved collections and blocked sources
  • Explainability or evidence appropriate to risk, such as citations, affected fields, or decision rationale
  • Safety policies for prohibited actions, sensitive data, and high-risk recommendations
  • Compliance posture documented for security, privacy, vendor processing, and data residency
  • Human approval controls for sensitive actions
  • Incident and rollback process with customer communication rules

`

Buyers need to know the system can be governed from the product, not only from a security questionnaire.

9. Operations and support

AI products require operating models.

`text

Operations and Support

  • What are the common failure modes?
  • How are AI-related support tickets categorized?
  • Can support inspect enough context to help safely?
  • What gets escalated to product, engineering, ML, or ops?
  • How are model drift and vendor drift monitored?
  • What is the rollback path?
  • Who communicates meaningful behavior changes to customers?

`

If nobody owns failures after launch, users will.

10. Competitive durability

If a competitor can call the same model API, what makes this product durable?

`text

Durability

  • Workflow ownership
  • Proprietary or permissioned data loops
  • Distribution advantage
  • Deep integrations
  • Strong evals and domain quality bars
  • Trust and governance controls
  • Operational learning from support and usage
  • Customer-specific configuration or context
  • Switching costs created by real value, not lock-in games

`

The moat is rarely the model alone.

Product audit worksheet

Use this as the summary page.

Optional scoring: mark each section Green = 2, Yellow = 1, Red = 0. Weight UX/trust, evals, data rights, and enterprise readiness double for high-risk or enterprise products. A low-risk internal drafting tool might ship at 14/20 with no red sections; a customer-facing finance explainer should require 24/28, no red sections, and explicit approval from product, security, and support.

`text

Building AI Products Audit Worksheet

Product / feature:

Owner:

Date:

  1. Problem selection: Green / Yellow / Red

Notes:

  1. Workflow fit: Green / Yellow / Red

Notes:

  1. UX and trust: Green / Yellow / Red

Notes:

  1. Evals and release gates: Green / Yellow / Red

Notes:

  1. Model strategy: Green / Yellow / Red

Notes:

  1. Data rights and learning loop: Green / Yellow / Red

Notes:

  1. Economics and performance: Green / Yellow / Red

Notes:

  1. Enterprise readiness: Green / Yellow / Red

Notes:

  1. Operations and support: Green / Yellow / Red

Notes:

  1. Competitive durability: Green / Yellow / Red

Notes:

Score:

  • Raw score:
  • Weighted score, if used:
  • Red sections that block launch:

Decision:

  • Ship
  • Ship behind limited rollout
  • Prototype only
  • Fix before launch
  • Do not build

Top three risks:

1.

2.

3.

Next three actions:

1.

2.

3.

`

The practical standard

A serious AI product is not measured by how impressive it is in a controlled demo.

It is measured by whether it creates value repeatedly, handles uncertainty honestly, respects data rights, passes quality gates, fits real workflows, and can be operated after launch.

That is the bar.

Everything else is theater.