INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann

TL;DR
INTELLECT-2 is a pioneering 32-billion-parameter reasoning model trained via a fully decentralized, asynchronous reinforcement learning approach across a heterogeneous swarm of contributors, introducing new infrastructure and training techniques.
Contribution
It presents the first large-scale decentralized RL training framework for language models, with novel components and modifications that improve training stability and model performance.
Findings
Achieved state-of-the-art reasoning performance in 32B models.
Developed and open-sourced a complete decentralized training infrastructure.
Demonstrated successful training of a large-scale reasoning model without centralized control.
Abstract
We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors. To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers. Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications
