Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

TL;DR
This paper introduces Nemotron-Cascade, a cascaded reinforcement learning framework that enhances general-purpose reasoning models' performance and training efficiency across multiple domains.
Contribution
It proposes Cascade RL, a novel domain-wise reinforcement learning approach that improves model reasoning, reduces complexity, and achieves state-of-the-art results without performance loss.
Findings
Outperforms previous models on multiple benchmarks.
RLHF pre-training significantly boosts reasoning ability.
Model achieves IOI silver medal performance.
Abstract
Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep thinking modes, without any performance gap relative to a thinking-only counterpart. Departing from conventional approaches that blend heterogeneous prompts from different domains, Cascade RL orchestrates sequential, domain-wise RL, reducing engineering complexity and delivering state-of-the-art performance across a wide range of benchmarks. Notably, RLHF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/Nemotron-Cascade-8B-Thinkingmodel· 4.6k dl· ♡ 404.6k dl♡ 40
- 🤗nvidia/Nemotron-Cascade-14B-Thinkingmodel· 1.7k dl· ♡ 791.7k dl♡ 79
- 🤗nvidia/Nemotron-Cascade-8Bmodel· 2.0k dl· ♡ 672.0k dl♡ 67
- 🤗cyankiwi/Nemotron-Cascade-14B-Thinking-AWQ-4bitmodel· 20 dl· ♡ 120 dl♡ 1
- 🤗cyankiwi/Nemotron-Cascade-14B-Thinking-AWQ-8bitmodel· 3 dl3 dl
- 🤗cyankiwi/Nemotron-Cascade-8B-Thinking-AWQ-4bitmodel· 1 dl1 dl
- 🤗cyankiwi/Nemotron-Cascade-8B-Thinking-AWQ-8bitmodel· 4 dl4 dl
- 🤗cyankiwi/Nemotron-Cascade-8B-AWQ-4bitmodel· 77 dl· ♡ 177 dl♡ 1
- 🤗cyankiwi/Nemotron-Cascade-8B-AWQ-8bitmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗nvidia/Nemotron-Cascade-8B-Intermediate-ckptsmodel· ♡ 13♡ 13
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
