Linear Dynamics in the RLVR Training of Large Language Models
Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao

TL;DR
This paper reveals a consistent linear regime in RLVR training of large language models, enabling faster training and improved performance through extrapolation techniques.
Contribution
It uncovers the linear dynamics in RLVR training, providing theoretical insights and practical extrapolation methods to enhance training efficiency and model performance.
Findings
RLVR training exhibits a highly linear evolution of weights and log-probabilities.
Weight-space extrapolation achieves 6.1x training speedup with performance comparable to standard RL.
Output-space extrapolation outperforms standard RL, improving benchmark scores by 4.2%.
Abstract
Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner (). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
