The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
Yu Huang, Zixin Wen, Yuejie Chi, Yuting Wei, Aarti Singh, Yingbin Liang, Yuxin Chen

TL;DR
This paper develops a theory explaining how reinforcement learning with verifiable rewards naturally creates an implicit curriculum, facilitating extended reasoning in transformers through a progression from easy to hard problems.
Contribution
It introduces a theoretical framework for understanding the implicit curriculum in RLVR, highlighting the role of difficulty spectrum smoothness and adapting Fourier analysis techniques.
Findings
Implicit curriculum emerges naturally during training.
Smooth difficulty spectrum leads to stable learning progression.
Abrupt difficulty jumps cause phase transitions and training plateaus.
Abstract
Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the training dynamics of RLVR for transformers on compositional reasoning tasks. Our theory shows that mixed-difficulty training naturally follows an implicit curriculum: without any explicit schedule, easier problems become learnable first and shape the frontier for harder ones, creating a learning progression from easy to hard during optimization. The effectiveness of this curriculum is governed by the smoothness of the difficulty spectrum. When the spectrum is smooth, training dynamics enters a well-behaved relay regime, in which persistent gradient signals on easier problems make…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
