The Right Depth for the Problem Series #10: The Problem-Solving Operating System

A company does not need every problem to become a framework.

It does need a shared operating system for deciding how problems move.

Without one, problems travel through personality, politics, and volume. The loudest issue gets attention. The best-connected team gets escalation. The most polished memo looks like the strongest thinking. The most urgent Slack thread defines priority. Small problems get overworked. Big problems get patched.

A problem-solving operating system is not bureaucracy. It is a lightweight way to route problems to the right depth.

The spine

Use the same spine every time, scaled to the size of the problem:

Classify → size risk → define → find constraint → choose mode → assign ownership → solve or stop.

For a small issue, this can take two minutes.

For a major strategic question, it may take weeks.

The consistency is in the questions, not the ceremony.

Intake: what signal arrived?

Every problem starts as a signal.

Customer complaint. Incident. Missed metric. Team frustration. Executive concern. Sales objection. Quality escape. Market surprise. Process failure. People issue.

The intake question is: what did we observe, and what do we know versus assume?

Do not over-process intake. Capture enough to avoid losing the signal:

What happened?
Who is affected?
How urgent is it?
Who saw it first?
What is currently at risk?
Who needs to know now?

Bad-news flow matters. If signals arrive late or softened, fix the route. A healthy intake path rewards early, rough signal more than late, polished certainty.

Triage: what kind of problem is this?

Classify before solving.

Incident, decision, unknown, execution gap, process failure, people problem, strategy question, customer escalation, technical constraint, market uncertainty.

The classification can change as evidence arrives. That is fine. The first classification prevents the first solution from becoming the only frame.

Risk sizing: how deep should we go?

Use reversibility, blast radius, urgency, and authority.

Can we undo it cheaply?

Who is affected if wrong?

Is time a real constraint?

Can the current owner decide?

This determines depth.

Fast when reversible, low blast radius, known failure mode, time-sensitive, and within authority.

Slow when ambiguous, high stakes, repeated, cross-functional, expensive to reverse, or authority-constrained.

Definition: what problem are we solving?

Write the problem without shrinking it.

Include symptom and suspected system. Separate problem from solution. Preserve customer reality. Avoid abstraction that makes action impossible.

A good definition is specific enough to operate and true enough not to mislead.

Constraint: what is limiting progress?

Find the real constraint.

Is it authority, information, capacity, trust, incentives, sequencing, architecture, customer reality, or a decision no one has made?

Look for queues. Look for repeated workarounds. Look for places where effort does not change the outcome.

The constraint determines leverage.

Mode: what are we doing next?

Choose one:

Fast fix: known, reversible, contained.

Deep dive: repeated, ambiguous, consequential.

Research: uncertainty blocks a decision.

Experiment: plausible answer needs field evidence.

Escalation: authority or blast radius exceeds current owner.

Stop: not worth solving, solved enough, wrong level, or decision needed.

Mode clarity prevents action from becoming noise.

Ownership: who carries it?

Assign one accountable owner.

Define decision rights, required inputs, escalation path, resources, deadline, and follow-up signal.

Do not confuse coordination with ownership. Do not assign accountability without authority. Do not hide behind shared ownership when a tradeoff needs one decision owner.

Follow-through: did the problem change?

The operating system is not complete when an action is done. It is complete when the relevant signal changes or the company consciously accepts the remaining risk.

Every material problem should have a follow-up point:

Did the incident class recur?
Did the customer escalation pattern change?
Did decision latency improve?
Did quality improve without slowing flow?
Did the owner have enough authority?
Did the experiment answer the question?
Did the stop decision hold?

This is how judgment compounds.

The lightweight template

For meaningful problems, use a one-page problem record:

Signal:
Classification:
Risk: reversibility, blast radius, urgency:
Problem definition:
Suspected constraint:
Mode:
Owner:
Decision rights / escalation:
Next action:
Follow-up signal:
Stop/reopen condition:

Not every problem needs a document. But every consequential problem needs these answers somewhere. For small problems, that may be a Slack note. For medium problems, a short record. For large problems, an explicit decision log with owner and follow-through date.

Cadence

The operating system should appear in existing forums.

Incident review: classification, blast radius, prevention, detection, owner.

Executive meeting: decisions, tradeoffs, escalation, stop calls.

Product review: customer evidence, unknowns, experiments, constraints.

Operating review: recurring bottlenecks, ownership, follow-through.

People review: role constraints, manager ownership, trust, accountability.

Do not create a new meeting unless the current system has no route for the problem.

Shared language is the product

The biggest value of the operating system is shared language.

“This is a fast fix.”

“This needs a deep dive.”

“This is research, not a decision yet.”

“This should be an experiment.”

“This is above our authority.”

“This is solved enough.”

“This is the wrong owner.”

That language lets teams move faster without pretending every problem is simple. It lets leaders slow down without turning judgment into bureaucracy.

The goal is not process compliance. The goal is better problem-solving throughput: fewer fake fixes, fewer zombie analyses, fewer unclear owners, fewer late escalations, fewer problems solved at the wrong depth.

Good operators do not solve every problem the same way.

They build a system that helps the company know the difference.

Keep it lightweight or it will be bypassed

The operating system only works if people use it when pressure rises. If it becomes a heavy template, teams will route around it. If it requires too much polish, bad news will slow down. If every small issue needs formal intake, operators will either ignore the system or drown in it.

The standard should be proportionality. Small problems need shared language. Medium problems need a short record. Large problems need explicit decision rights, evidence, and follow-through. The system should help people think, not make them perform compliance.

A good test: does the operating system make the next right action clearer within minutes? If not, simplify it.

This is part 10 of 10 in The Right Depth for the Problem.

Previous: When to Stop Solving
View the full series index