Linear Dynamics in the RLVR Training of Large Language Models

Tianle Wang; Jiayu Liu; Zhongyuan Wu; Shenghao Jin; Wei Chen; Hao Xu; Ning Miao

arXiv:2601.04537·cs.LG·May 22, 2026

Linear Dynamics in the RLVR Training of Large Language Models

Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper reveals a consistent linear regime in RLVR training of large language models, enabling faster training and improved performance through extrapolation techniques.

Contribution

It uncovers the linear dynamics in RLVR training, providing theoretical insights and practical extrapolation methods to enhance training efficiency and model performance.

Findings

01

RLVR training exhibits a highly linear evolution of weights and log-probabilities.

02

Weight-space extrapolation achieves 6.1x training speedup with performance comparable to standard RL.

03

Output-space extrapolation outperforms standard RL, improving benchmark scores by 4.2%.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ( $R^{2} > 0.7$ ). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Miaow-Lab/RLVR-Linearity
github

Models

🤗
Miaow-Lab/RLVR-Linearity-Checkpoints
model

Datasets

Miaow-Lab/RLVR-Linearity-Dataset
dataset· 29 dl
29 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)