AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen; Zhuolin Yang; Zihan Liu; Chankyu Lee; Peng Xu; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping

arXiv:2505.16400·cs.LG·June 6, 2025

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

PDF

Open Access 10 Models 5 Datasets 1 Video

TL;DR

This paper demonstrates that large-scale reinforcement learning can significantly improve reasoning abilities in small- and mid-sized models, surpassing distillation methods, through a systematic training approach and robust data curation.

Contribution

It introduces a novel RL training recipe involving math-only and code-only prompts, showing substantial performance gains and insights into curriculum learning and model stabilization.

Findings

01

Math-only RL improves math benchmark performance (+14.6% / +17.2%)

02

Code-only RL enhances code reasoning (+6.8% / +5.8%)

03

Extended code RL further boosts code tasks with minimal math impact

Abstract

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications