Seven months ago, Fireworks AI closed a $250 million Series C at a $4 billion valuation. On May 27, Bloomberg reported the company is in talks to raise a new round at $15 billion. That is not a rounding error — it is a nearly fourfold jump in the price of the same company inside the same market cycle. Understanding why requires setting aside the headline and examining what actually changed: not Fireworks, but the market’s understanding of what AI inference infrastructure is worth.
Inference is the unglamorous half of the AI stack. Training gets the announcements and the Nobel prizes. Inference is what happens every time a user sends a message, a recommendation engine scores a product, or an enterprise agent executes a workflow. It is the process of running a trained model against new inputs — and it is where the actual cost of AI lives. As enterprise adoption has scaled from experiment to production, inference has quietly become one of the largest and fastest-growing spending categories in technology. The market for running AI models was valued at roughly $104 billion in 2025 and is projected to reach more than $312 billion by 2034. That trajectory is the bet Fireworks is pricing itself against.
What Fireworks Actually Sells
Fireworks was founded in 2022 by seven engineers who built PyTorch at Meta. Lin Qiao, the CEO, was previously Head of PyTorch at Meta. Her co-founders ran the PyTorch compiler, core maintenance, and ads infrastructure teams before leaving to rebuild the same stack outside the walls of a hyperscaler. The founding logic was straightforward: the people who built the most widely used AI training framework in the world understood, better than anyone, where the bottlenecks in production inference would appear. They built a company to solve them before the rest of the market recognized the problem.
The core product is an inference cloud for open-source and enterprise AI models. What differentiates Fireworks technically is FireAttention — a suite of custom CUDA kernels engineered specifically for inference workloads. On DeepSeek V4 Pro, the most widely deployed frontier open-weight model of 2026, independent benchmarks from Artificial Analysis show Fireworks delivering throughput of 167 to 174 tokens per second — roughly five times faster than comparable providers at the same price point, while preserving the model’s full 1 million token context window. Its FireAttention V4, released in February 2026, pushed performance further still, achieving more than 250 tokens per second on NVIDIA B200 GPUs using FP4 precision. As of early 2026, the platform processes more than 15 trillion tokens per day and sustains approximately 180,000 requests per second across its global footprint.
“Fireworks is the only platform that delivers all three: state-of-the-art open-source models, sub-second inference at scale, and the ability to own and differentiate your AI.”
— Lin Qiao, CEO and Co-Founder, Fireworks AI, October 2025
The Revenue That Justifies the Ask
The valuation conversation starts with the growth rate. By February 2026, Fireworks had reached $315 million in annualized recurring revenue — up 416% year over year. That pace puts it in a category where almost no enterprise software company at this scale operates. The customer base exceeds 10,000 companies, with named enterprise accounts including Uber, Shopify, DoorDash, Cursor, Perplexity, Notion, and GitLab. For context: the company reported its Series B in July 2024 at a $552 million valuation. Sixteen months later, it was closing a round at $4 billion. Seven months after that, Bloomberg is reporting $15 billion. The velocity of the repricing is not driven by hype — it is driven by a revenue growth curve that keeps outrunning the previous valuation by a wider margin each cycle.
The Microsoft partnership, announced in public preview in March 2026, adds an institutional distribution dimension that changes the customer acquisition calculus. Fireworks is now integrated directly into Microsoft Foundry, Azure’s unified AI development platform. Enterprise customers building on Azure can access Fireworks inference natively, without a separate vendor relationship, alongside models including DeepSeek V3.2, Kimi K2.5, and OpenAI’s gpt-oss-120b. Microsoft cited Fireworks’ production metrics — 13 trillion tokens per day, 180,000 requests per second, 1,000-plus tokens per second on large models — as the rationale for selecting it as the platform’s open-model inference partner. That is not a co-marketing arrangement. It is Microsoft certifying the infrastructure as enterprise-grade and embedding it into the sales motion of the world’s largest enterprise software company.

This Is Not a Single-Company Story
The Fireworks repricing is the most dramatic data point in a sector-wide revaluation that has been building since late 2025. Baseten, which positions itself as an inference platform for production ML teams, raised at a $5 billion valuation in January 2026 — more than doubling its September 2025 valuation of $2.15 billion. By late May, it was in talks for a further raise at $11 billion. Modal Labs, a cloud platform for AI inference workloads, was in discussions to raise at $2.5 billion in February — more than double its previous valuation in under five months. Three separate companies, three separate investor syndicates, all repricing inference infrastructure on roughly the same timeline.
The hardware layer has moved even faster. NVIDIA acquired Groq, the inference chip startup, for $20 billion in December 2025. OpenAI acquired Cerebras, the wafer-scale chip company, for $20 billion around the same period. Those acquisitions established a price floor for hardware-layer inference assets and sent a clear message to investors evaluating software-layer plays: the people running the AI buildout have decided inference is strategic infrastructure worth owning at any price. The software platforms — Fireworks, Baseten, Modal — are the layer between the hyperscaler GPU pools and the enterprise applications actually running in production. As hyperscaler AI capex pushes toward $690 billion, the premise is that more compute creates more inference demand, and the platforms that can route that demand efficiently will capture significant margin.
The Questions the $15 Billion Number Leaves Open
A few things the current reporting does not resolve. First, the terms. Bloomberg’s sourcing is from people familiar with the matter, and the deal has not closed. Valuation headlines in private markets can carry structured terms — liquidation preferences, anti-dilution provisions, ratchets — that protect late investors at the expense of earlier stakeholders. The headline number and the economics delivered to a Series C investor are not necessarily the same calculation. Second, revenue multiple. At $315 million in ARR and a $15 billion valuation, Fireworks is being priced at roughly 48 times revenue. That is a premium consistent with the fastest-growing enterprise infrastructure companies in the current cycle, but it prices in continued execution at a pace almost no company sustains for more than a few quarters. Third, competitive durability. Fireworks competes with every major cloud provider’s native inference offering — AWS, Google Cloud, Azure — all of which have the distribution, the customer relationships, and the GPU pools to build comparable tooling. The moat is technical today. Whether custom CUDA kernels remain a durable differentiator as hyperscaler inference tooling matures is a question the $15 billion valuation assumes will resolve in Fireworks’ favor.
None of that makes the repricing irrational. It makes it a bet — a well-supported, data-backed bet on the trajectory of enterprise AI spending, the durability of open-model adoption, and the staying power of a software layer that has spent three years building the infrastructure the hyperscalers have mostly chosen to rent from it rather than rebuild. The round has not closed. When it does, the terms will matter as much as the number.
What to Watch Next
- Round close confirmation and disclosed terms. The $15 billion figure comes from people familiar with the matter and could still change. Watch for Index Ventures to confirm its co-lead position and for any named co-investors — particularly whether NVIDIA or AMD, both existing Fireworks backers, participate again at the new valuation.
- ARR trajectory through Q2 2026. The $315 million figure reflects February data. The first post-round revenue disclosure — whether in a press release, a partner announcement, or a secondary market filing — will show whether the 416% growth pace has held into the spring or whether it is beginning to moderate at scale.
- Baseten’s $1 billion round outcome. If both Fireworks and Baseten close major rounds within the same quarter at dramatically higher valuations, it removes any possibility that this is a single-company repricing and converts it into a category-level asset class revaluation with direct implications for how investors price inference exposure in public-market proxies like NVIDIA, CoreWeave, and Azure.
- Hyperscaler native inference competitive moves. AWS, Google Cloud, and Azure all have the distribution and compute assets to compete directly with Fireworks. Watch for any acceleration in hyperscaler inference tooling announcements — particularly any move to offer open-model fine-tuning and deployment in a single product — that would challenge the independent platform thesis at its core.
- FireAttention V5 or next-generation kernel release. Fireworks’ technical advantage is built on a continuous cadence of kernel improvements tied to new NVIDIA hardware generations. Any announcement tied to Vera Rubin (NVIDIA’s next architecture) will indicate whether the performance gap can be maintained as the hardware substrate evolves.
