RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

Hongzhi Zhang; Jia Fu; Jingyuan Zhang; Kai Fu; Qi Wang; Fuzheng Zhang; Guorui Zhou

arXiv:2507.07451·cs.CL·July 11, 2025

RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

Hongzhi Zhang, Jia Fu, Jingyuan Zhang, Kai Fu, Qi Wang, Fuzheng Zhang, Guorui Zhou

PDF

Open Access 1 Models 1 Datasets

TL;DR

RLEP introduces a two-phase reinforcement learning framework for large language models that replays verified successful trajectories to improve training efficiency and reasoning accuracy, achieving state-of-the-art results on math benchmarks.

Contribution

The paper proposes RLEP, a novel reinforcement learning method that combines experience replay with verified trajectories to enhance LLM reasoning performance.

Findings

01

Faster convergence in training.

02

Improved accuracy on math benchmarks.

03

Effective replay of high-quality reasoning paths.

Abstract

Reinforcement learning (RL) for large language models is an energy-intensive endeavor: training can be unstable, and the policy may gradually drift away from its pretrained weights. We present \emph{RLEP}\, -- \,Reinforcement Learning with Experience rePlay\, -- \,a two-phase framework that first collects verified trajectories and then replays them during subsequent training. At every update step, the policy is optimized on mini-batches that blend newly generated rollouts with these replayed successes. By replaying high-quality examples, RLEP steers the model away from fruitless exploration, focuses learning on promising reasoning paths, and delivers both faster convergence and stronger final performance. On the Qwen2.5-Math-7B base model, RLEP reaches baseline peak accuracy with substantially fewer updates and ultimately surpasses it, improving accuracy on AIME-2024 from 38.2% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Kwai-Klear/qwen2.5-math-rlep
model· 3 dl
3 dl

Datasets

Kwai-Klear/RLEP_dataset
dataset· 102 dl
102 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques