AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

TL;DR
This paper demonstrates that large-scale reinforcement learning can significantly improve reasoning abilities in small- and mid-sized models, surpassing distillation methods, through a systematic training approach and robust data curation.
Contribution
It introduces a novel RL training recipe involving math-only and code-only prompts, showing substantial performance gains and insights into curriculum learning and model stabilization.
Findings
Math-only RL improves math benchmark performance (+14.6% / +17.2%)
Code-only RL enhances code reasoning (+6.8% / +5.8%)
Extended code RL further boosts code tasks with minimal math impact
Abstract
Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/AceReason-Nemotron-14Bmodel· 22k dl· ♡ 9622k dl♡ 96
- 🤗nvidia/AceReason-Nemotron-7Bmodel· 5.2k dl· ♡ 205.2k dl♡ 20
- 🤗unsloth/AceReason-Nemotron-14Bmodel· 9 dl9 dl
- 🤗unsloth/AceReason-Nemotron-14B-GGUFmodel· 555 dl· ♡ 9555 dl♡ 9
- 🤗lmstudio-community/AceReason-Nemotron-14B-GGUFmodel· 62 dl· ♡ 262 dl♡ 2
- 🤗lucyknada/nvidia_AceReason-Nemotron-14B-exl3model
- 🤗lmstudio-community/AceReason-Nemotron-7B-GGUFmodel· 19 dl· ♡ 119 dl♡ 1
- 🤗QuantFactory/AceReason-Nemotron-7B-GGUFmodel· 97 dl· ♡ 297 dl♡ 2
- 🤗QuantFactory/AceReason-Nemotron-14B-GGUFmodel· 15 dl· ♡ 215 dl♡ 2
- 🤗Prince-1/AceReason-Nemotron-14B-Onnxmodel· ♡ 1♡ 1
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications
