Research Explainers May 28, 2026 9 min read

Self-Improvement Needs a P&L

The paper’s practical point is that recursive self-improvement is easier to talk about when it has accounting. A system is not self-improving because it sounds autonomous. It is self-improving if new gains reliably fund the next cycle of better prediction, action, and deployment.

Source note: York Westenhaver, Massey Branscomb, and Aidan Grant. “Recursive Self-Improvement is a Portfolio Optimization Problem.” AlphaFund whitepaper, accessed May 29, 2026. https://www.alphafund.com/whitepaper

Why This Paper Matters

Recursive self-improvement usually arrives with science-fiction baggage. The standard picture is an AI system that rewrites its own code, gets smarter, rewrites itself again, and moves through a fast takeoff curve. That framing is dramatic, but it often skips the ordinary constraint that every real system faces: improvement costs money, time, data, compute, attention, and risk.

AlphaFund’s whitepaper makes a more grounded move. It asks what recursive self-improvement would look like if it had to show up in a balance sheet.

The answer is not “the model edits itself.” The answer is a loop. A firm spends capital on better prediction and better deployment. Those improvements create measurable economic gains. Some of those gains are reinvested into the next improvement cycle. If the gains are real, and if they arrive faster than old edge decays, the firm compounds its own capability.

That is a useful reframing even if you are skeptical of the specific company claims. It turns recursive self-improvement from a metaphysical argument into an operating question: can the system measure whether each improvement cycle actually paid for the next one?

The Idea in Plain English

The paper treats a company as a bundle of assets that can be improved.

For AlphaFund’s quant-trading case, the assets are split into five channels. Investments are the positions the firm holds. Sensors are the data and feeds that let it see the world. Actuators are the things it can do, such as trade new instruments, route orders through new venues, or access more financing. R&D is the research process that discovers better methods. Parameters are the learned models, weights, rules, and beliefs that guide decisions.

Each channel can receive capital. A dollar can go into a dataset, a new market connection, a research experiment, a model-training run, or a position in the book. The paper’s claim is that a self-improving firm needs to price those choices against one another. It should ask: which dollar is most likely to improve future returns, after uncertainty, decay, market impact, and operating constraints?

That turns the corporation into a portfolio-optimization problem. The portfolio is not just stocks and bonds. It is the whole operating system of the firm.

The paper calls the forecasting object behind this process an Economic World Model, or EWM. The phrase matters because the authors do not mean a generic LLM with a finance prompt. They mean a model trained and evaluated under chronological discipline. At decision time, it can condition only on what the firm knew then. It predicts what happens next to the firm, the environment, and the reward.

In plainer terms: an EWM is a forecasting engine with a no-peeking rule and a P&L feedback loop.

What the Researchers Tested

This is not a conventional university paper with a clean public benchmark and fully reproducible results. It is a company whitepaper that mixes formal modeling, proprietary operating data, held-out backtests, scaling-law fits, and deployment history.

The paper builds a formal model first. It defines the corporation’s objective as growth in shareholder equity subject to survival. It defines the firm state, action vector, channel histories, and the learned model used to forecast future outcomes. It then defines the capital-allocation controller that chooses where the next dollar should go.

The empirical section tries to estimate the rows of that controller. It looks at investments, sensors, actuators, R&D, parameters, and continual learning. Each row is treated as an asset with a marginal return and uncertainty.

The headline score is called t-RSI. It measures the gap between alpha creation and alpha decay over a horizon, divided by uncertainty. A positive value means the system appears to be creating new edge faster than existing edge disappears. A high value means the gap is large relative to the estimated noise.

The paper reports AlphaFund’s current three-month t-RSI as 9.61 standardized units. A later posterior chart reports 9.39 under a specific path. Either way, the claim is not subtle: under the paper’s measurement framework, the create side currently dominates the decay side.

What They Found

Recursive self-improvement becomes measurable when the loop is closed

The strongest part of the paper is the operating model. It says that self-improvement needs five things: a state, actions, outcomes, measurement, and reinvestment.

That sounds basic, but it cuts through a lot of AI autonomy talk. Many systems can generate suggestions. Fewer can act. Fewer still can measure whether those actions improved the economic state of the system. Almost none can tie that measurement back into the next capital-allocation decision with a clean audit trail.

Quant trading is a natural setting for the argument because the feedback loop is unusually tight. Positions are marked. Trades settle. Data feeds can be priced. Research experiments can be logged. Model changes can be held out. P&L is noisy, but it is at least in the right unit.

That is why the paper keeps returning to the same constraint: the bridge from improvement to equity has to be measurable. Without that bridge, recursive self-improvement becomes a story about capability. With it, the story becomes a claim about compounding.

The Economic World Model is mostly a discipline, not a magic object

The paper’s EWM concept is useful because it separates prediction from language generation.

A generic LLM can be trained on a static internet snapshot that mixes documents from before and after the event it is asked to predict. That is fine for next-token modeling. It is dangerous for economic forecasting. A forecasting model must respect time. It should not learn from future information and then appear wise in backtests.

The EWM is the firm’s learned approximation of how the firm and the world move after an action. It is scored on future realized outcomes. It also needs channel-specific histories: what the firm observed, what it did, and what happened next.

This is less glamorous than “AI CEO,” but much more operational. The model is valuable only if it improves decisions that can be settled later.

t-RSI is a create-versus-decay score

The most compact idea in the paper is t-RSI. It asks whether the firm creates alpha faster than old alpha decays.

That distinction matters. A model can look good today because it found an edge that is already fading. A research process can look productive because it overfits benchmarks. A trading system can look scalable until market impact starts eating the very edge it is trying to deploy.

t-RSI tries to put those forces into one standardized distance. The numerator is expected alpha creation minus expected alpha decay. The denominator is uncertainty. The paper’s reported value, 9.61 over a three-month horizon, is meant to say that AlphaFund’s current improvement signal is far above estimated noise.

The caveat is obvious and important: the score is only as good as the row laws, decay estimates, execution assumptions, and uncertainty model underneath it.

The evidence is interesting, but not all equally public

The paper provides several evidence anchors.

For sensors, it reports a data-scaling decay rate around 0.156, interpreted as roughly 30% of reducible error removed per 10x increase in effective data. For actuators, it reports a Sharpe gain around 0.47 per decade of dollar-weighted tokens from expanding the action surface. For R&D, it analyzes 929 auto-research experiments and reports that the rolling top-10% frontier improved by about 0.34 Sharpe per decade of completed experiments.

For parameters, the paper places model improvement on a Kaplan/Hoffmann/Chinchilla/Muennighoff-style scaling surface and treats parameter staleness as alpha decay. It reports empirical decay as small and near zero in its panel, with a per-asset median decay rate around 2-4e-4 per 60-minute cycle. It also estimates a continual-learning crossover near 3.2e14 dollar-weighted tokens.

These numbers make the paper more concrete than a normal AI strategy essay. They also expose the reader’s trust problem. Some results depend on proprietary surfaces, live trading data, execution costs, market-impact assumptions, and company-maintained ledgers. The paper offers formalism and some audit language, but an outside reader cannot fully reproduce the core t-RSI claim from public information alone.

That does not make the paper useless. It means the right posture is “interesting operating thesis with self-reported evidence,” not “settled proof.”

Why It Happens

The mechanism is compounding under measurement.

If the firm can see more, it can train better predictors. If it has more actuators, it can turn predictions into more kinds of actions. If the research process improves, it can discover better models and better allocation rules. If the parameters improve, the firm can extract more edge from the same environment. If the portfolio earns money, some of that money can pay for more sensors, actuators, R&D, and parameters.

The important part is that the channels reinforce one another. Better sensors can make better R&D possible. Better R&D can make parameters better. Better parameters can make investments better. Better investments can finance more sensors.

That is the self-improvement loop the paper is trying to formalize. It is not a single model making itself smarter in isolation. It is a measured firm using its own profits to improve the machine that produces future profits.

What This Means for Builders

The builder takeaway is not “copy AlphaFund.” It is “instrument the loop.”

If you want a system to improve itself, you need logs that connect actions to outcomes. You need a clean separation between training information and future evaluation. You need every candidate improvement to pass through a gate that asks whether it improved the objective out of sample. You need to measure decay, not just creation. And you need capital allocation across the improvement channels, not a pile of disconnected experiments.

This is especially relevant for agentic companies. A lot of agent infrastructure focuses on task completion: did the agent call the tools, finish the workflow, and produce an answer? That is necessary, but it is not enough for compounding. The harder question is whether the system learns which workflows, data, tools, policies, and model updates create durable economic improvement.

In that sense, the whitepaper is really an argument for API-complete operations. If the firm cannot observe, act, log, evaluate, and reinvest through software, the differentiable-corporation story breaks.

What This Means for Buyers and Operators

For operators, the paper gives a useful test for AI transformation claims.

Do not ask only whether a system uses advanced models. Ask whether the improvement loop is measured. What does the system observe? What decisions can it actually make? What objective does it optimize? What gets held out? How does it know when old edge has decayed? How are failed experiments counted? How does it decide whether the next dollar should go to data, tooling, compute, hiring, training, distribution, or execution?

Most organizations will not have t-RSI. Their outcomes are slower, noisier, and mediated by humans, contracts, sales cycles, customer behavior, and organizational politics. That is exactly the point. The paper argues that quant trading is unusually measurable. In a normal SaaS company, an improvement to the product may affect retention months later, filtered through onboarding, support, pricing, procurement, and customer context.

So the broader lesson is not that every company should compute t-RSI. It is that most companies claiming AI-driven self-improvement probably lack the measurement substrate to justify the claim.

What to Watch Next

The field should watch whether AlphaFund or similar firms can expose stronger public evidence without giving away proprietary trading edge. Independent audits, sealed evaluation protocols, clearer live-versus-backtest separation, and repeated measurement across market regimes would make the framework more persuasive.

Builders should watch whether the EWM idea travels beyond trading. The concept is strongest where actions settle quickly and outcomes are priced directly. It gets harder in domains where value is delayed, customer-mediated, or politically allocated.

Operators should also watch for the capacity question. A strategy can look self-improving at small scale and then flatten when market impact, competition, financing, or operational constraints absorb the gains. The paper discusses those forces, but they are exactly where the future proof will have to live.

Limitations and Caveats

The main limitation is verification. This is a company whitepaper about the company’s own operating system. It is detailed, but many decisive quantities are proprietary or only partially visible.

The second limitation is evidence type. Some of the empirical claims come from backtests, fitted scaling laws, filtered experiment cohorts, and internal ledgers. Those can be useful, especially with strict holdouts, but they are not the same as broad independent replication.

The third limitation is selection. The R&D section tracks improvement along a top-10% frontier of experiments. That may be a reasonable way to price a search process, but it is not proof that the underlying research distribution itself is improving at the same rate.

The fourth limitation is generalization. The paper is most convincing in quant trading because the feedback loop is unusually clean. Its broader claim about autonomous self-improving corporations is more speculative.

The best reading is therefore measured: AlphaFund has not proven that recursive self-improvement is solved. It has offered a serious way to ask what proof would need to look like.

Source

Westenhaver, York, Massey Branscomb, and Aidan Grant. (2026). Recursive Self-Improvement is a Portfolio Optimization Problem. AlphaFund whitepaper. Available at: https://www.alphafund.com/whitepaper

Research Browse Research & Deep Dives

Move through market maps, company deep dives, cross-profile patterns, papers, reports, and technical explainers.

Start Here Find the best entry point

Use the site map to choose a path through AI, operations, strategy, profiles, and series.

Topic Explore AI systems

Read essays on AI adoption, agents, business systems, and the changing shape of work.