Agents Change Work by Lowering the Cost of Execution

When AI moves from answering questions to executing workflows, the biggest change is not faster chat. It is a different cost structure for knowledge work.

Source note: Jeremy Yang, Kate Zyskowski, Noah Yonack, and Jerry Ma. “How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope.” arXiv:2606.07489, June 5, 2026. https://arxiv.org/abs/2606.07489

Why This Paper Matters

Most evidence about AI productivity still comes from assistant-style tools. A worker asks a model for help, reads the answer, and then performs the next step manually. That setup can make writing, coding, analysis, or support work faster, but the human still has to coordinate the process. The model helps inside the task. It does not own the workflow.

This paper studies a different shift: the move from conversational assistants to autonomous agents. Using production data from Perplexity, the authors compare Perplexity Search, an answer engine, with Perplexity Computer, a general-purpose agent orchestration product. Search represents the conversational baseline. Computer represents the agent mode, where the user specifies an outcome and the system searches, browses, codes, creates documents, uses external services, delegates work to subordinate agents, and persists until it returns a deliverable.

That comparison matters because the economics are different. A chatbot reduces the cost of thinking about the next step, but the worker still pays for execution with their own time. An agent raises the upfront cost of delegation and review, but lowers the marginal cost of every intermediate step. If that tradeoff is real, agents should not merely speed up existing tasks. They should change which tasks people attempt in the first place.

The paper’s claim is that this is exactly what happens. Computer sessions perform much more autonomous machine work than Search sessions, reduce estimated completion time and cost on matched tasks, and expand the scope of work users attempt across occupations, cognitive levels, expertise domains, and task bundles.

The Idea in Plain English

Imagine two ways to get a complicated report done.

In the first version, you use a very good research assistant who answers questions instantly. You ask for market data. You ask for sources. You ask for examples. Every answer helps, but you still have to decide the next query, open the spreadsheet, make the chart, write the document, and revise the result. The assistant lowers the cost of each bit of information. It does not remove the cost of doing the work.

In the second version, you delegate the outcome. You still have to explain what you want. You still have to check the final output. But the middle of the process is handed off. The agent decomposes the task, gathers information, uses tools, creates artifacts, and comes back with something closer to finished work.

The paper formalizes this distinction with a simple task model. Conversational AI has a low fixed cost and a high marginal cost per step. It is easy to ask one quick question, but long tasks still require lots of human effort. Agentic AI has a higher fixed cost and a lower marginal cost per step. Delegation and verification take more care, but once the agent starts, additional steps are much cheaper for the human.

That means conversational tools should be best for short tasks: lookups, clarifications, and simple synthesis. Agents should become more attractive as task length increases. Once a task has enough steps, the higher fixed cost of delegation is amortized, and the lower execution cost dominates.

The practical consequence is a wider work frontier. If complex tasks become cheaper to attempt, users should not only complete old work faster. They should attempt work that was previously too annoying, too multidisciplinary, or too coordination-heavy to bother with.

What the Researchers Tested

The authors analyze Perplexity usage from February 27 through May 27, 2026, shortly after Computer launched. Their main empirical setting compares Search and Computer usage among users who used both products during the same period.

The cleanest part of the design uses matched sessions. The authors identify 10,000 session pairs where the same user submitted near-identical initial queries to Search and Computer, using cosine similarity above 0.99. This design is useful because it holds the underlying task unusually close. Instead of comparing easy Search queries with hard Computer queries, it compares cases where the same user appeared to try the same task in both modes.

They then measure several things.

For autonomy, they compare machine execution time, tool use, connector calls, and follow-up query behavior. They also classify follow-up turns in 1,000 matched multi-turn sessions to see whether users are still manually directing the system or moving into verification and extension.

For efficiency, they estimate the time a human using Search would need to perform the same work manually and compare it with the Computer plus human workflow. The paper uses tool-call traces, standardized wage estimates, independent LLM-based time estimates, and interviews with 25 active users to triangulate the result.

For scope, they classify large samples of queries against Bloom’s Revised Taxonomy and O*NET occupational taxonomies. This lets the authors ask whether Computer queries are more cognitively complex, whether they cross occupational boundaries more often, whether they draw on more knowledge domains, and whether they combine more work activities into one request.

The study is not a lab experiment where users are randomly assigned to use one product or another. It is field evidence from a production system. That gives it scale and realism, but it also means the authors rely on matching, classification, and estimation rather than perfect experimental control.

What They Found

The first result is adoption. Computer usage grew quickly over the three-month window, with cumulative queries reaching 84 times their first-week total. In a random sample of 100,000 Computer queries, Research and Analysis made up 25.8 percent of use cases, and Document and Asset Creation made up 18.6 percent. Roughly a third of expected outputs were structured artifacts such as documents, websites, codebases, or spreadsheets.

The second result is autonomy. In matched sessions with near-identical initial queries, Computer performed 26 minutes of autonomous work per user session. Search performed 33 seconds. That is a 48-fold increase in machine work. Computer also used external tools much more often. The paper reports that Computer invoked connector calls four times as often as Search, with especially large gaps in domains such as finance.

This autonomy did not show up as lower quality in the paper’s dissatisfaction measure. Per-query medium-to-high dissatisfaction was 1.3 percent for Computer and 2.9 percent for Search, a 55 percent reduction. The authors also find a shift in follow-up behavior. Search users are more likely to continue steering and clarifying. Computer users are more likely to verify, extend, or build on a delivered artifact.

The third result is efficiency. On matched tasks, the authors estimate that a human using Search alone would take 269 minutes to complete the full task. With Computer plus human review, average completion time falls to 36 minutes. That is an 87 percent reduction in time. Using wage-based cost estimates, the paper estimates a 94 percent cost reduction. A breakeven analysis suggests that a Search-aided human would need to complete all manual steps in under 20 minutes to match the Computer plus human cost.

The fourth result is scope expansion. Computer users do not simply do the same work faster. They attempt different work. Horizontally, Computer queries cross occupational boundaries more often than Search queries from the same users. This pattern holds across all eight occupation clusters in the paper, with an average gap of 9 percentage points.

Vertically, Computer queries are more complex. The paper finds that 71 percent of Computer queries are abstract non-routine tasks versus 53 percent for Search. Higher-order Bloom cognition appears in 76 percent of Computer queries versus 55 percent for Search. Create-level work accounts for 50 percent of Computer queries versus 26 percent of Search queries.

Computer queries also draw on broader expertise. The average Computer query requires 2.40 ONET knowledge domains versus 1.74 for Search, and Computer queries are nearly three times as likely to require three or more domains. At finer grains of work activity, the gap widens: Computer queries engage more generalized work activities, intermediate activities, detailed work activities, and occupation-specific task statements. The paper reports that 23 percent of Computer queries involve at least one ONET task statement that never appears in the same users’ Search queries.

Why It Happens

The paper’s mechanism is cost structure. Agents are not just better answer generators. They shift the user from operator to supervisor.

With a conversational assistant, the worker is still responsible for the sequence. They must decide what to ask next, carry information between tools, interpret each answer, perform software actions, and assemble the final artifact. As the number of steps rises, the human cost rises with it. That makes many potentially valuable tasks uneconomical for one person to attempt.

With an agent, the human pays more at the beginning and end. Delegation requires a clearer outcome, more context, and sometimes constraints or approvals. Review requires judgment. But the long middle phase becomes cheaper because the machine does the step-by-step execution. It can search, browse, use tools, create files, and iterate while the user is not manually driving every action.

That change makes expertise boundaries more permeable. A user may not know how to produce a financial model, build a website, draft a legal-style memo, or write code from scratch. But if they can describe the outcome and evaluate the result well enough, the agent can absorb a large share of the execution burden. The bottleneck moves from “Can I personally perform all these steps?” to “Can I specify and verify this work?”

This also explains why users bundle more subtasks into a single query. If each additional step is expensive, users break work into small conversational turns. If additional steps are cheap, users delegate larger composite tasks.

What This Means for Builders

Builders should read this as evidence that agent products need to be designed around delegation, execution, and review, not only around better chat.

The first priority is context integration. Computer’s advantage comes from the ability to act across tools and environments. An agent that can only talk is still close to an assistant. An agent that can read, write, browse, code, call services, and return artifacts starts to change the economics of the task.

The second priority is the delegation interface. If agents have higher fixed costs, product teams should reduce those costs. Good agent products will help users scope objectives, state constraints, choose acceptable sources, expose assumptions, and define review criteria. Prompt boxes alone are a weak interface for delegated work.

The third priority is progress visibility and review. Long-running autonomous work requires trust. Users need to see what the agent is doing, approve sensitive actions, inspect intermediate artifacts, and understand why the final output is credible. The agent should make review easier, not leave the user with a mysterious finished object.

Finally, builders should optimize for composite tasks. The value of agents appears strongest where several steps, tools, and domains need to be combined. Products that only benchmark single-turn answers may miss the main source of agent value.

What This Means for Buyers and Operators

For buyers, the paper suggests that agent ROI should not be judged only by time saved on existing workflows. The bigger effect may be the work frontier: tasks that previously would have required another department, a contractor, a specialist, or a long delay may become feasible for one high-context operator.

This has organizational consequences. If individuals can perform more cross-domain work, some coordination costs fall. A manager may be able to create a first-pass model, brief, landing page, or analysis without waiting for several handoffs. That does not eliminate specialists. It changes when specialists are needed. They may move upstream into standards and review, or downstream into high-stakes exceptions, rather than doing every routine execution step.

Operators should also notice the new bottleneck. If execution gets cheaper, verification becomes more important. The person delegating to the agent must know enough to judge whether the output is accurate, compliant, useful, and aligned with the real objective. Teams that treat agents as magic labor will accumulate errors. Teams that treat agents as execution capacity with strong review loops will capture more of the upside.

The paper also implies that procurement should look beyond seat licenses and model capability. The important questions are operational: what tools can the agent use, what evidence does it expose, what approval gates exist, how well does it preserve context, and how easily can a reviewer audit the result?

What to Watch Next

The next question is whether these individual-level gains translate into organization-level productivity. It is possible for individual tasks to get much faster while teams still bottleneck on approval, compliance, decision rights, integration, or deployment. The history of technology adoption suggests that workflow redesign matters as much as tool capability.

Researchers should also watch whether agent use changes job design. The paper shows users crossing occupational boundaries and bundling more work into single requests. If that pattern persists, jobs may become less defined by narrow execution skills and more defined by judgment, taste, domain framing, and review capability.

Another thing to watch is software design. If agents become major users of software, applications may need better APIs, machine-readable state, permissioning, audit logs, and rollback mechanisms. Agent-native work will stress systems that were built only for human clicking.

Finally, the field should watch whether dissatisfaction remains low as tasks become longer and higher stakes. A 55 percent lower dissatisfaction signal is meaningful, but the hardest agent failures may emerge in work that takes hours or days, crosses more systems, or produces outputs that are hard to verify quickly.

Limitations and Caveats

The paper is valuable because it uses production data, but that also introduces limits.

First, the data comes from one company’s products over one launch-period window. Perplexity Computer users may be early adopters, paid users, or unusually motivated users. Their behavior may not represent the average enterprise employee or casual consumer.

Second, the matched-session design is strong, but it can only compare tasks that appear in both Search and Computer. Some agent tasks may have no Search equivalent. That means the measured efficiency gains are strongest for comparable tasks, while the most agent-native work may be harder to benchmark.

Third, several important measures rely on classification and estimation. O*NET mapping, Bloom classification, dissatisfaction signals, and human-time estimates are useful proxies, not direct observation of economic value. The authors run robustness checks and interviews, but the minute and dollar estimates should be treated as informed estimates rather than exact accounting.

Fourth, the paper is partially studying Perplexity’s own product ecosystem. That does not invalidate the findings, but readers should be careful about generalizing every number to every agent product. A less integrated agent, a different user base, or a weaker review interface could produce different results.

The safest conclusion is not that all agents instantly create 87 percent time savings. It is that autonomy changes the cost curve, and that change can reshape both the speed and scope of knowledge work when the agent is capable enough and the task is suitable for delegation.

Source

Jeremy Yang, Kate Zyskowski, Noah Yonack, and Jerry Ma. (2026). How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope. arXiv preprint arXiv:2606.07489. Available at: https://arxiv.org/abs/2606.07489

Agents Change Work by Lowering the Cost of Execution

Why This Paper Matters

The Idea in Plain English

What the Researchers Tested

What They Found

Why It Happens

What This Means for Builders

What This Means for Buyers and Operators

What to Watch Next

Limitations and Caveats

Source

Get the next notes and essays.

More in Research Explainers

AI 2027 Is a Scenario, Not a Prediction

Agent Coding Costs Hide in Review, Not Generation

Synthetic Consumers Work Better When They Talk First

Explore the surrounding system