You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Zhepei Wei; Xinyu Zhu; Wei-Lin Chen; Chengsong Huang; Jiaxin Huang; Yu Meng

arXiv:2605.21468·cs.LG·May 21, 2026

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Chengsong Huang, Jiaxin Huang, Yu Meng

PDF

1 Repo

TL;DR

This paper reveals that RLVR weight trajectories are low-rank and predictable, enabling a simple linear extrapolation method called RELEX to efficiently approximate future checkpoints with minimal training.

Contribution

The authors introduce RELEX, a novel rank-1 extrapolation technique that significantly reduces RLVR training steps needed for large language models.

Findings

01

RELEX matches or exceeds RLVR performance with only 15% of training steps.

02

RELEX can extrapolate checkpoints up to 10-20 times beyond the observed data.

03

The method's success is due to denoising effects from rank-1 projection.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of this projection evolves near-linearly with training steps. Motivated by this, we propose a simple and compute-efficient method RELEX (REinforcement Learning EXtrapolation), which estimates the rank-1 subspace from a short observation window and extrapolates future checkpoints via linear regression, with no learned model required. Across three models (i.e., Qwen2.5-Math-1.5B, Qwen3-4B-Base, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weizhepei/RELEX
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.