Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Zhipeng Chen; Tao Qian; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2604.11446·cs.LG·April 14, 2026

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Zhipeng Chen, Tao Qian, Wayne Xin Zhao, Ji-Rong Wen

PDF

1 Repo

TL;DR

This paper introduces NExt, a nonlinear low-rank trajectory modeling framework that accelerates large language model reinforcement learning with verifiable rewards, reducing computational costs significantly.

Contribution

The paper proposes a novel nonlinear extrapolation method for low-rank parameter trajectories, improving RLVR efficiency for large language models.

Findings

01

Reduces RLVR computational overhead by approximately 37.5%.

02

Effectively models nonlinear parameter trajectories during RLVR.

03

Demonstrates robustness across various tasks and algorithms.

Abstract

Recently, scaling reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs) has emerged as an effective training paradigm for significantly improving model capabilities, which requires guiding the model to perform extensive exploration and learning, leading to substantial computational overhead and becoming a key challenge. To reduce the number of training steps, Prior work performs linear extrapolation of model parameters. However, the dynamics of model parameter updates during RLVR training remain insufficiently understood. To further investigate the evolution of LLMs during RLVR training, we conduct empirical experiments and find that the rank-1 subspace of the model does not evolve linearly, and its dominance over the original parameters is further amplified during LoRA training. Based on the above insights, we propose the \textbf{N}onlinear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RUCAIBox/NExt
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.