RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Zhiyuan Zeng; Hamish Ivison; Yiping Wang; Lifan Yuan; Shuyue Stella Li; Zhuorui Ye; Siting Li; Jacqueline He; Runlong Zhou; Tong Chen; Chenyang Zhao; Yulia Tsvetkov; Simon Shaolei Du; Natasha Jaques; Hao Peng; Pang Wei Koh; Hannaneh Hajishirzi

arXiv:2511.07317·cs.CL·November 11, 2025

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Zhiyuan Zeng, Hamish Ivison, Yiping Wang, Lifan Yuan, Shuyue Stella Li, Zhuorui Ye, Siting Li, Jacqueline He, Runlong Zhou, Tong Chen, Chenyang Zhao, Yulia Tsvetkov, Simon Shaolei Du, Natasha Jaques, Hao Peng, Pang Wei Koh, Hannaneh Hajishirzi

PDF

Open Access 2 Models

TL;DR

RLVE introduces adaptive, verifiable environments for reinforcement learning, enabling scalable training of language models that significantly improve reasoning capabilities over static data methods.

Contribution

The paper presents RLVE, a novel approach with a large suite of environments that adapt difficulty during training, enhancing reasoning skills in language models.

Findings

01

Environment scaling improves reasoning generalization.

02

RLVE achieves 3.37% gain on benchmarks, outperforming traditional RL.

03

Code is publicly released for reproducibility.

Abstract

We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). RLVE enables each verifiable environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses. In contrast, static data distributions often lead to vanishing learning signals when problems are either too easy or too hard for the policy. To implement RLVE, we create RLVE-Gym, a large-scale suite of 400 verifiable environments carefully developed through manual environment engineering. Using RLVE-Gym, we show that environment scaling, i.e., expanding the collection of training environments, consistently improves generalizable reasoning capabilities. RLVE with joint training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications