Building AI Products Series #3: Choosing Problems Worth Solving With AI

Not every product problem should become an AI feature.

Some problems need a better workflow. Some need clearer information architecture. Some need rules. Some need a human approval step. Some need fewer features, not a model.

AI is useful when model behavior creates leverage that a deterministic product cannot easily provide. It is wasteful when it adds uncertainty, cost, latency, and support burden to a problem that was already solvable.

Problem selection is the first serious AI product decision.

Start with the job, not the model

The wrong question is: "Where can we add AI?"

The better question is: "Where are users stuck because the product cannot handle language, ambiguity, context, variation, or judgment at scale?"

AI tends to help when the work involves messy inputs, fuzzy categorization, natural language, synthesis, pattern recognition, personalized generation, or decisions that require context from many sources.

It tends to hurt when the work requires exactness, deterministic rules, low-latency interaction, strict auditability, or simple structured workflows that a normal product can handle.

A refund policy engine may need rules, not a model: if the customer bought within 30 days, the item is unopened, and the region allows returns, approve; otherwise route to support. Adding a model would make a clear entitlement harder to audit.

A research synthesis workflow may deserve AI because the value comes from ambiguity: comparing messy interviews, product feedback, usage notes, and support tickets to find patterns a filter cannot see. A permission system should not be probabilistic. A sales call summary might be, because context and judgment create leverage.

The five tests for AI-worthy problems

A problem is more likely to be worth solving with AI if it passes five tests.

1. The user outcome is clear

"Help users be more productive" is not clear.

"Reduce the time from customer call to clean CRM update" is clear.

The tighter the outcome, the easier it is to design the workflow, measure quality, and decide what failure means.

2. The model has real leverage

AI should do something meaningfully hard for traditional software: interpret text, extract intent, summarize evidence, generate useful drafts, compare cases, adapt to context, or route ambiguous work.

If a dropdown, filter, template, or rule gets you most of the value with less risk, use that.

3. The workflow can absorb uncertainty

AI output can be wrong. The product needs a place for review, correction, rejection, or escalation.

If a wrong answer directly triggers irreversible harm, the feature needs strong controls or should not exist in that form.

4. The data is available and legitimate

The model needs context. That context must be accessible, current, permissioned, and legally usable.

A feature that needs sensitive enterprise data but cannot satisfy retention, deletion, audit, or consent requirements is not ready.

5. The economics work

A feature can be technically possible and commercially bad.

If every successful outcome costs too much in model calls, human review, support time, or latency, the product may not scale. AI product economics are product requirements, not finance cleanup.

Artifact: AI feature selection scorecard

Use this scorecard before committing roadmap capacity.

`text

AI Feature Selection Scorecard

Rate each from 1 (weak) to 5 (strong).

A. User value

Clear, frequent, painful user job: __ / 5
Direct business value or willingness to pay: __ / 5
Meaningful improvement over current workflow: __ / 5

B. AI leverage

Requires language/context/judgment/variation: __ / 5
Hard to solve well with rules or normal UX: __ / 5
Model output can be constrained and evaluated: __ / 5

C. Workflow fit

Fits an existing workflow or creates a clearly better one: __ / 5
Output has an obvious next action: __ / 5
Users can review/correct without excessive burden: __ / 5

D. Risk and trust

Failure is detectable and recoverable: __ / 5
Confidence can be communicated honestly: __ / 5
Permissions and audit needs are manageable: __ / 5

E. Data and economics

Required data is available and consented: __ / 5
Cost per successful outcome is viable: __ / 5
Latency is acceptable for the workflow: __ / 5

Decision guide:

60-75: Strong candidate
45-59: Prototype with explicit risk assumptions
30-44: Redesign the problem or use non-AI methods
Below 30: Do not build as AI feature

The score is not the decision. It forces the discussion.

Beware executive theater

Some AI features are built because the company wants to say it has AI.

That is not automatically wrong. Markets care about signals. Buyers ask about roadmaps. Teams need to learn.

But do not confuse theater with product strategy. If a feature exists mainly to satisfy a narrative, label it honestly. Keep it small. Do not let it consume the roadmap that should go to real user value.

Human-in-loop is not a magic fix

Teams often use "human in the loop" to wave away risk.

Who is the human? Do they have time? Do they understand the model failure modes? Are they accountable? Is review faster than doing the original task? What happens when volume spikes? How does their feedback improve the product?

A human review step can be excellent product design. It can also be an expensive bottleneck hidden inside a roadmap slide.

Design it like a real workflow.

Choose problems with correction paths

The best early AI product problems often have clear correction loops.

A generated email can be edited. A ticket category can be changed. A research summary can be accepted, rejected, or annotated. A recommendation can be explained and overridden.

Correction paths help users trust the product and help the team learn where the model fails.

If users cannot correct the system, the product cannot improve from real use.

The practical standard

AI is worth using when it creates meaningful leverage on a real job, inside a workflow that can handle uncertainty, with data rights and economics that work.

That is a narrower set of problems than the hype suggests.

Good. Narrow is where products get built.