Executive summary

AI inference gateways are the control plane for multi-model AI. The simple version is a proxy: one API that routes requests to many model providers. The more important version is a management layer for model traffic: policy, fallback, spend, privacy, observability, audit, billing, and provider choice.

The category exists because AI application development is no longer a single-provider problem. Teams choose models by task, cost, latency, context length, modality, tool support, privacy policy, geography, and uptime. That creates infrastructure work. A team can scatter that logic through application code, build an internal proxy, buy a managed router, self-host an open-source gateway or rely on a cloud model platform.

OpenRouter is the most visible managed marketplace. LiteLLM is the primary self-hosted reference point. Portkey, Helicone, Kong, Cloudflare, Vercel, AWS, Google and Microsoft each attack the category from a different starting position. This is the core shape of the market: every serious AI platform has a reason to own the routing and governance layer.

The bull case is that AI gateways become as standard as API gateways were for web services. The bear case is that the function gets absorbed by clouds, API-management incumbents, open-source self-hosting, and application frameworks. The category is real either way; the question is who captures durable value. Sources: Docs: AI Gateway and Amazon: Bedrock

Why now

The first reason is model fragmentation. OpenRouter’s public docs describe a catalog with hundreds of models, provider routing controls, and model metadata. Its Models API check during this research returned hundreds of entries across dozens of provider prefixes. The count changes, but the operational reality stays the same: model choice is dynamic. Sources: Openrouter documentation and Openrouter: Models

The second reason is production spend. OpenRouter’s June 2025 financing announcement stated that routed annual-run-rate inference spend grew from $10M in October 2024 to more than $100M as of May 2025. That is company-stated routed spend, not revenue, but it shows the budget line is becoming material. Source: Globenewswire: OpenRouter Raises 40 Million To Scale Up Multi Model Inference For Enterprise

The third reason is enterprise governance. Once prompts include customer data, code, documents, legal text, or internal decisions, model calls need policy. OpenRouter documents Zero Data Retention routing; LiteLLM documents enterprise gateway features; Cloudflare and Kong position AI gateway features around control and observability. Sources: Openrouter documentation, Docs documentation, Cloudflare: AI Gateway, Docs: AI Gateway

Market definition

An AI inference gateway is the data-plane layer between applications and model providers. It receives a request, normalizes it, applies provider/model policy, routes it, handles failures and returns the result.

An AI control plane is the management layer for that traffic. It answers questions such as: who can call which model, what data policy applies, how much can be spent, which provider is preferred, what happened during an incident, and what record exists for audit.

The category includes several product types:

  • Managed model marketplaces such as OpenRouter.
  • Open-source/self-hosted gateways such as LiteLLM.
  • Enterprise control planes such as Portkey and Kong AI Gateway.
  • Observability-led gateways such as Helicone.
  • Edge/cloud gateways such as Cloudflare AI Gateway.
  • Cloud model platforms such as AWS Bedrock, Google Vertex AI, and Azure AI Foundry.
  • Application-level abstractions such as Vercel AI SDK.

Value chain

The upstream layer is compute and model supply: GPU capacity, cloud infrastructure, model labs, open-weight models, and hosted inference providers.

The middle layer is the gateway/control-plane market. This is where model choice becomes operational policy. The layer decides which provider gets a request, how failures are handled, what gets logged, whether data retention is allowed, which budget applies, and what usage record is produced.

The downstream layer is the application: copilots, agents, internal tools, automation products, customer-facing AI apps, and developer platforms.

The control point is valuable because technical decisions here are business decisions. A model call is more than a function call. It carries cost, latency, privacy, compliance, user experience, vendor exposure, and operational risk. Source anchors: Openrouter documentation and Nist: AI Risk Management Framework

Buyer and budget

The early buyer is a developer or founder who wants access to multiple models without managing many provider accounts. The product-led wedge is convenience.

The production buyer is a platform engineer, AI infrastructure lead, head of engineering, or CTO. This buyer cares about uptime, fallback, observability, cost control, key management and developer experience.

The enterprise buyer includes security, compliance, procurement and data-governance stakeholders. Once prompts and logs contain sensitive data, the gateway becomes part of the risk surface.

The budget path starts in experimentation and moves toward cloud/infrastructure COGS as usage scales. This is why the category can look small at first and become strategic later: it starts as a developer tool, then becomes spend governance. Source: Globenewswire: OpenRouter Raises 40 Million To Scale Up Multi Model Inference For Enterprise

Competition

OpenRouter is the managed marketplace/router. Its advantage is breadth, developer convenience, model discovery, and provider routing. Its risk is that sophisticated customers may graduate to direct contracts or self-hosted gateways.

LiteLLM is the open-source/self-hosted reference. Its advantage is control. A platform team can run the gateway inside its own infrastructure. That is attractive when privacy, cost, or customization matter. Source: GitHub: BerriAI/litellm

Portkey and Helicone approach the market from control and observability. They are less about model marketplace liquidity and more about production operations: traces, policies, routing, logs, and governance.

Cloudflare, Kong, AWS, Google and Microsoft attack from distribution. They already sit inside enterprise infrastructure budgets. Their advantage is procurement and trust. Their weakness is that they may be less neutral or less broad than a dedicated multi-model router.

Vercel AI SDK and similar framework abstractions compete from the application-code layer. For many teams, a framework-level abstraction is enough until traffic volume, compliance, or model spend forces the problem into infrastructure.

Challengers

The challenger map is straightforward. OpenRouter pushes marketplace liquidity and managed routing. LiteLLM pushes self-hosted control. Portkey and Helicone push production operations through observability, gateway behavior, and governance. Kong and Cloudflare sit between challenger and incumbent because they are established infrastructure companies adding AI-specific gateway layers. Sources: Openrouter documentation, GitHub: BerriAI/litellm, Portkey documentation, Docs, Docs: AI Gateway, Cloudflare: AI Gateway

Where profit pools accrue

The thin part of the market is API translation. If all a gateway does is translate one model schema into another, the value likely compresses. Sources: Openrouter documentation and Docs documentation

The durable profit pools are different:

  • Governance: role controls, audit logs, privacy policy, data retention, and compliance support.
  • Spend management: budgets, model substitution, provider choice, caching and reporting.
  • Reliability: fallback, retries, routing health, and incident visibility.
  • Procurement: one bill, one contract surface, or cloud marketplace attachment.
  • Data: model/provider performance, usage patterns, and routing outcomes.

The strongest companies will combine several of these. A pure proxy is fragile. A trusted control plane is harder to replace. Source anchors: Docs documentation and Docs: AI Gateway

Regulation and policy

Regulation is not about a single law creating the market. Enterprise AI usage requires records, oversight, privacy boundaries, and operational accountability. The EU AI Act and NIST AI RMF both push organizations toward clearer risk-management practices. Sources: Artificialintelligenceact: The Act and Nist: AI Risk Management Framework

For gateways, this matters because prompt traffic is where policy can be enforced. A policy document does not stop a developer from sending sensitive data to a non-approved endpoint. A gateway can.

Technology shifts

The OpenAI-compatible API is the practical integration standard, but it is not perfect. Tool calls, structured outputs, reasoning controls, image/audio input, context behavior, caching and provider-specific constraints can leak through the abstraction. Sources: Openrouter documentation and Docs documentation

Agents make the problem sharper. Agent systems can generate many background model calls. They need cheaper planning calls, stronger fallback, logging, tool-use governance and policy enforcement across long workflows. That makes gateway/control-plane features more important, but it may also exceed what a simple request/response proxy can handle.

Bear case

The strongest bear case is absorption. Hyperscalers can add routing and guardrails to model marketplaces. API-management vendors can add AI plugins. Open-source gateways can serve sophisticated teams. App frameworks can solve the problem for lighter workloads.

In that world, standalone vendors still exist, but the category fragments. OpenRouter wins developer marketplace traffic; LiteLLM wins self-hosted teams; Cloudflare wins edge users; AWS/Microsoft/Google win cloud-native enterprises; Portkey/Helicone/Kong win specific control or observability accounts.

The second bear case is standardization. If model APIs converge and providers handle more governance directly, API translation becomes less valuable. The category then shifts toward governance, evals, observability and spend management.

Bull case

The bull case is that multi-model AI becomes permanent. Teams do not want to hard-code provider choice. They want policy statements: use this provider first, fall back to that one, avoid non-ZDR endpoints, cap spend, log everything, route low-value work to cheaper models and alert when quality or latency changes. Sources: Openrouter documentation and Openrouter documentation

What would change the conclusion

The bull case gets stronger if enterprises standardize on standalone gateways as the system of record for AI traffic. It gets weaker if mature customers mostly self-host, move to cloud-native controls, or handle routing inside application frameworks. The most important evidence would be named enterprise standardization, retention at high inference spend, and compliance teams treating gateway logs as audit records.

If that becomes normal, the gateway becomes the system of record for AI usage. That is a large prize. It makes gateways less like SDK helpers and more like API gateways, identity systems, and observability platforms.

What to watch next

  • Whether enterprises standardize on standalone gateways or cloud-native gateways.
  • Whether OpenRouter can keep large customers as spend scales.
  • Whether LiteLLM becomes the default self-hosted standard.
  • Whether Cloudflare, Kong, AWS, Google and Microsoft bundle enough functionality to absorb the market.
  • Whether gateway logs become accepted audit/compliance records.
  • Whether agent workloads push the category toward stateful control planes, not stateless proxies.
  • Whether model rankings, usage data, and provider performance become defensible data assets.

Sources / further reading