The practical point of this paper is that AI adoption should be analyzed at the workflow level. A task may look only partly automatable in isolation, but if AI can take over several neighboring steps in sequence, the economics can change abruptly.
Source note: Mert Demirer, John J. Horton, Nicole Immorlica, Brendan Lucier, and Peyman Shahidi. “Chaining Tasks, Redefining Work: A Theory of AI Automation.” February 11, 2026. https://demirermert.github.io/Papers/Demirer_chaining_tasks_ai_automation.pdf
Why This Paper Matters
Most discussions of AI exposure ask a task-by-task question: can AI do this specific activity better, faster, or cheaper than a human?
That framing is useful, but it misses how work actually happens. Real work is sequential. A person defines a question, gathers context, applies a method, produces an output, checks it, hands it to someone else, and then another step begins. The value of automating one step depends heavily on what sits before and after it.
This paper builds an economic model around that simple fact.
Its central argument is that AI does not only substitute for individual tasks. AI can create contiguous blocks of machine-executed work, which the paper calls AI chains. When that happens, a human does not need to inspect every intermediate step. The human can instead verify the final output of the chain.
That changes the automation decision. A step that would not be worth automating alone may become worth automating when it can be attached to a neighboring AI step. The benefit is not only that AI performs the step. The benefit is that the firm avoids another handoff, another verification moment, and another boundary between pieces of work.
This is a better lens for the strange adoption pattern many companies are seeing. AI is not evenly useful across a job description. It works in clumps. It creates sudden productivity jumps when enough adjacent steps become reliable. And it often requires reorganizing the job before the gains show up.
The Idea in Plain English
The paper distinguishes three ways a production step can be completed.
A step is manual when a human does it without AI.
A step is augmented when AI does the work but a human reviews and approves the result.
A step is automated when AI completes it and passes the result directly to another AI step without human review.
The important unit is the AI chain: a sequence of automated steps followed by one augmented step. The automated steps happen under the hood. The human interacts with the last step and verifies the final output.
The paper uses a data scientist workflow as an example. Suppose the job involves defining a business question, finding the data, building an analysis pipeline, drafting a report, and presenting the findings. AI might search for the data, build the pipeline, and draft the report. The human then reviews the report. In that case, the data search and pipeline steps are automated inside the chain, while the report draft is augmented because the human checks it.
This matters because verification is costly. If every AI-generated intermediate output needs human approval, the workflow can become slower rather than faster. But if several AI steps can be chained and checked at the end, the economics improve.
That is the paper’s departure from standard comparative advantage logic. In a classic task model, each step goes to the input that is relatively best at that step. In this model, adjacency matters. A firm may let AI handle a step even when a human is better at that step in isolation, because including the step inside an AI chain avoids another expensive break in the workflow.
What the Researchers Tested
The paper has two parts: a formal model and an empirical test.
The formal model treats production as an ordered sequence of steps. Firms decide which steps should be manual, which should be augmented with AI, and which should be fully automated inside AI chains. They also decide how to bundle steps into tasks and jobs.
The short-run version keeps job boundaries, wages, and worker skills fixed. The firm chooses how to deploy AI inside existing roles to minimize the time needed to complete the work.
The long-run version lets the firm redesign jobs. Once AI can handle enough neighboring steps, the firm may change the boundaries of work itself. It may need different workers, fewer handoffs, different review points, and different skill bundles.
The empirical section then tests three predictions using occupational task data. The authors use O*NET task data, AI exposure labels from prior research, realized AI execution data from Anthropic’s Economic Index, and GPT-5-mini-generated workflow orderings for 872 occupations.
The tests ask whether AI-executed steps appear in chains, whether dispersed AI-suitable steps are less likely to be executed, and whether a step is more likely to be executed by AI when its immediate neighbors are also executed by AI.
What They Found
AI-suitable work clusters
The first finding is that AI-executed steps are not randomly scattered across workflows.
The paper reports an average AI chain length of 1.45 in the empirical data. That may sound short, but it is significantly longer than what would be expected if AI-executed steps were randomly assigned across task positions or execution labels.
The practical reading is that today’s AI already shows some local workflow structure. It does not merely pick off isolated tasks. It tends to appear in short adjacent runs.
That matters because short chains can become longer chains as AI quality improves. The first reliable adjacent steps are where larger workflow redesign starts to become plausible.
Fragmentation lowers real AI use
The second finding is about fragmentation.
The paper defines a fragmentation index to measure how dispersed AI-suitable steps are across a workflow. A job where AI-capable steps sit next to each other has low fragmentation. A job where AI-capable steps are separated by steps that AI handles poorly has high fragmentation.
The empirical result is straightforward: controlling for the share of AI-exposed steps, occupations with more fragmented AI-suitable work show lower realized AI execution.
That is a useful correction to crude exposure scores. Two jobs can have the same share of AI-exposed tasks and very different automation potential. The job where those tasks cluster is more likely to benefit from chaining. The job where they are interleaved with human-heavy steps may need more supervision, more handoffs, and more reorganization before AI helps much.
This explains why broad AI exposure estimates often feel too blunt. Exposure share tells you how much of the job touches AI-capable activity. Fragmentation tells you whether those activities can be turned into a working production system.
Neighboring steps create local complementarities
The third finding is that adjacency itself predicts execution.
When conceptually similar steps appear across different occupations, a step is more likely to be executed by AI if its immediate neighbors are also executed by AI. The effect survives occupation-family and task fixed effects. More distant neighbors matter far less.
That is exactly what the chaining model predicts. The complementarity is local. The step next to you matters because it can share context, output format, verification, and handoff structure. A similar step far away in the workflow is less useful.
For builders, this is the key design lesson. The real question is often not “can AI do this step?” It is “can AI do this step, the previous step, and the next step with one coherent review boundary?”
Why It Happens
The paper’s mechanism is verification.
AI augmentation requires a human to inspect the output. That inspection cost is not incidental. In many business workflows, review is where trust, liability, quality, and accountability live.
If every AI step creates a review point, automation can fragment work. A person must keep stopping, inspecting, correcting, and restarting the process. In that world, AI may still help, but the productivity gain is limited by supervision overhead.
AI chaining changes the unit of review. The human no longer verifies every internal step. The human verifies the final output of a block.
This creates a non-linear payoff. One isolated AI step saves some execution time but still requires review. Two adjacent AI steps may save more than twice as much if the second step can consume the first step’s output without a human in the middle. Three adjacent AI steps can start to look like a new task boundary rather than a faster version of the old process.
The cost is reliability. Longer chains compound error risk. A chain only works when the final output is good enough to review directly and when intermediate errors are either unlikely, recoverable, or detectable at the end.
So the adoption pattern is lumpy. A small improvement in AI quality may do little until it crosses the threshold where a longer chain becomes viable. Then the optimal workflow can change suddenly.
What This Means for Builders
Builders should stop designing AI features around isolated tasks and start mapping chains of adjacent work.
A practical product discovery process would ask:
- What is the actual sequence of work?
- Which steps can AI perform reliably today?
- Which AI-capable steps sit next to each other?
- Where does human verification really need to happen?
- What context, format, state, or tool contract lets one AI step feed the next?
The best AI products will often be workflow products, not prompt boxes. They will preserve state across steps, enforce schemas, manage intermediate outputs, expose review points, and make failure inspectable.
This also changes evaluation. It is not enough to evaluate one model call. Builders need chain-level evals: can the system take a real input, run the adjacent steps, produce a reviewable output, and recover cleanly when a middle step fails?
The paper also helps explain why some AI agents feel impressive in demos but weak in production. Demos often show a chain when everything goes right. Production needs to know where the chain should stop, what the human should review, and how much risk is hiding inside the unreviewed middle.
What This Means for Buyers and Operators
For buyers, the paper gives a sharper diligence question than “how much of this job can AI automate?”
Ask where the AI chain is.
If a vendor claims to automate a workflow, map the sequence. Which steps are fully automated? Which step is augmented and reviewed by a human? Where are the handoffs? What intermediate outputs are hidden? What evidence lets the reviewer trust the final result?
The answer determines whether the product is saving real work or merely moving work into review.
For operators, the fragmentation index is especially useful. A team can look at a job and ask whether AI-suitable steps are clustered or scattered. Clustered steps are better candidates for near-term automation. Scattered steps may still benefit from copilots, but they are less likely to produce large end-to-end gains until the surrounding workflow changes.
This also affects reorganization. The biggest gains may not come from putting AI into today’s job descriptions. They may come from redrawing job boundaries around the new chains. That is why the paper connects AI adoption to the productivity J-curve: early adoption can look disappointing because firms have not yet redesigned work around the new technology.
What to Watch Next
The first thing to watch is whether AI products move from task copilots to chain-owning systems.
The second thing to watch is whether enterprises measure fragmentation. AI exposure maps are common. Workflow adjacency maps are rarer and more useful.
The third thing to watch is review design. The winning systems will not remove humans everywhere. They will put humans at the right verification boundaries.
The fourth thing to watch is discontinuity. If the paper is right, AI progress will not translate smoothly into productivity. Some model improvements will matter little. Others will suddenly make longer chains viable and force job redesign.
Finally, watch for better empirical traces. O*NET and AI execution data are useful, but direct workflow logs from real companies would make this theory much easier to test.
Limitations and Caveats
The empirical workflow ordering is generated with GPT-5-mini rather than directly observed from workplace process logs. The authors test robustness across ten alternative prompt formulations, but the ordering still depends on a model-generated reconstruction of occupational workflows.
The data also reflects current AI execution patterns, not a mature deployment environment. As tools, integrations, and organizational practices improve, average chain length could change.
The formal model makes simplifications, as all useful models do. It abstracts messy organizational factors such as politics, compliance, incentives, worker resistance, quality variation, procurement friction, and implementation capacity.
The paper is strongest as a theory of why AI value is local, sequential, and threshold-driven. It should not be read as proof that every clustered workflow should be automated. Some chains will still fail because the review point is too weak, the hidden error risk is too high, or the organization cannot maintain the workflow.
The broader lesson holds: AI adoption is not only about which tasks models can do. It is about which adjacent steps can be chained into a new unit of work.
Source
Demirer, Mert, Horton, John J., Immorlica, Nicole, Lucier, Brendan, and Shahidi, Peyman. (2026). Chaining Tasks, Redefining Work: A Theory of AI Automation. Available at: https://demirermert.github.io/Papers/Demirer_chaining_tasks_ai_automation.pdf