Fireworks AI launches a training platform, supporting full-parameter training with billions of parameters

MarketWhisper

2026-04-07 05:22:17

Fireworks AI has released a preview version of Fireworks Training, expanding the company’s positioning from a pure inference infrastructure provider into an “end-to-end training + deployment” platform. Founded by Lin Qiao, a former Meta engineer who helped build PyTorch, this AI infrastructure company is currently valued at $4.0 billion, with a daily token processing volume of 1.5 trillion.

Three-Tier Training Architecture: Full Coverage From No-Code Operations to Research-Grade Customization

Fireworks Training’s three-tier architecture is designed for users with different technical backgrounds, enabling the product team, ML engineers, and researchers to complete the entire workflow from training to deployment on a single platform:

Function Positioning of Three Service Tiers

Training Agent (No-Code Layer): For product teams without ML infrastructure, describes tasks and uploads data to complete an end-to-end workflow; currently supports LoRA fine-tuning

Managed Training (Engineer Layer): For ML engineers, supports SFT, DPO, and reinforcement learning fine-tuning, including full-parameter training capability

Training API (Research Layer): For research teams, allows customizing loss functions and training loops, and supports reinforcement learning algorithms such as GRPO and DAPO

The scale of full-parameter training spans widely—from a single-node Qwen3 8B to trillion-parameter models like Kimi K2.5 on 64 NVIDIA B200 GPUs, covering the complete scale range of today’s mainstream open-source models.

Three Major Customer Cases: Quantifiable Performance Data in Production Environments

Among Fireworks AI’s current inference customers, three top AI application companies have already completed cutting-edge reinforcement learning training and published specific performance data.

Vercel: Trained an automatic code-fixing model for its code generation product v0. The correct-code generation rate is 93%, while Claude Sonnet 3.5 under the same conditions is only 62%. End-to-end latency improved by 40x compared with the closed-source models previously used.

Genspark: Performed reinforcement learning fine-tuning on the trillion-parameter open-source model Kimi K2 to build a deep research agent. Tool call volume increased by 33%, and inference cost decreased by 50%.

Cursor: Completed reinforcement learning training for Composer 2 in a distributed manner across 3 to 4 global clusters. It currently ranks #1 on CursorBench and has achieved shared GPU resource pooling between training and production inference.

Core Technical Difference: Numerical Consistency Between Training and Inference

The core technical differentiator emphasized by Fireworks AI is “numerical consistency” between training and inference. For MoE (mixture-of-experts) models, even small numerical deviations in hidden states can produce cascading amplification effects in expert routing decisions, causing model behaviors learned in the training environment to be impossible to fully replicate during inference.

Fireworks publishes the KL divergence values between training and inference for all supported models. For all models, the values are below 0.01, providing a consistent benchmark for quantifiable comparison, so developers can assess the stability of model behavior when transferring from training to production deployment.

Frequently Asked Questions

What company is Fireworks AI?

Fireworks AI is an AI inference infrastructure company founded by Lin Qiao, a former Meta engineer who helped build PyTorch. The company is currently valued at $4.0 billion, processes 1.5 trillion tokens per day, and its core customers include mainstream AI applications such as Cursor, Vercel, and Genspark.

Which types of users is Fireworks Training’s three-tier architecture each suited for?

Training Agent is for product teams without ML infrastructure (no-code operations); Managed Training is for ML engineers (supports SFT, DPO, and full-parameter training for reinforcement learning); Training API is for research teams (allows customizing loss functions and training loops, and supports algorithms such as GRPO and DAPO).

Why does Fireworks AI emphasize that KL divergence is below 0.01?

KL divergence measures numerical deviation between the training and inference environments. The larger the deviation, the more unstable the model’s behavior becomes after deployment. This is especially critical for MoE models—small deviations can be amplified into routing-decision differences. Fireworks AI enables developers to objectively evaluate the quality of consistency from training to deployment by publishing quantifiable metrics.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments