Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Boxin Wang; Chankyu Lee; Nayeon Lee; Sheng-Chieh Lin; Wenliang Dai; Yang Chen; Yangyi Chen; Zhuolin Yang; Zihan Liu; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping

arXiv:2512.13607·cs.CL·March 30, 2026

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

PDF

15 Models 3 Datasets

TL;DR

This paper introduces Nemotron-Cascade, a cascaded reinforcement learning framework that enhances general-purpose reasoning models' performance and training efficiency across multiple domains.

Contribution

It proposes Cascade RL, a novel domain-wise reinforcement learning approach that improves model reasoning, reduces complexity, and achieves state-of-the-art results without performance loss.

Findings

01

Outperforms previous models on multiple benchmarks.

02

RLHF pre-training significantly boosts reasoning ability.

03

Model achieves IOI silver medal performance.

Abstract

Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep thinking modes, without any performance gap relative to a thinking-only counterpart. Departing from conventional approaches that blend heterogeneous prompts from different domains, Cascade RL orchestrates sequential, domain-wise RL, reducing engineering complexity and delivering state-of-the-art performance across a wide range of benchmarks. Notably, RLHF…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.