Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Ling Team, Anqi Shen, Baihui Li, Bin Hu, Bin Jing, Cai Chen, Chao Huang, Chao Zhang, Chaokun Yang, Cheng Lin, Chengyao Wen, Congqi Li, Deng Zhao, Dingbo Yuan, Donghai You, Fagui Mao, Fanzhuang Meng, Feng Xu, Guojie Li, Guowei Wang, Hao Dai, Haonan Zheng, Hong Liu, Jia Guo

TL;DR
This paper introduces Ring-1T, a trillion-parameter reasoning model with novel training innovations, achieving state-of-the-art benchmarks and enabling open access to large-scale reasoning capabilities.
Contribution
It presents three new methods—IcePop, C3PO++, and ASystem—for training trillion-scale models efficiently and reliably, and releases the first open-source trillion-parameter reasoning model.
Findings
Achieved high benchmark scores across multiple reasoning tasks.
Demonstrated stable training of trillion-parameter models.
Enabled open access to a state-of-the-art reasoning model.
Abstract
We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗inclusionAI/Ring-flash-2.0model· 85 dl· ♡ 10185 dl♡ 101
- 🤗inclusionAI/Ring-mini-2.0model· 798 dl· ♡ 181798 dl♡ 181
- 🤗inclusionAI/Ring-1T-previewmodel· 26 dl· ♡ 26726 dl♡ 267
- 🤗inclusionAI/Ring-1Tmodel· 134 dl· ♡ 230134 dl♡ 230
- 🤗inclusionAI/Ring-1T-FP8model· 1.6k dl· ♡ 191.6k dl♡ 19
- 🤗servantofares/Ring-mini-2.0model· 23 dl23 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
