RePO: Replay-Enhanced Policy Optimization

Siheng Li; Zhanhui Zhou; Wai Lam; Chao Yang; Chaochao Lu

arXiv:2506.09340·cs.CL·June 12, 2025

RePO: Replay-Enhanced Policy Optimization

Siheng Li, Zhanhui Zhou, Wai Lam, Chao Yang, Chaochao Lu

PDF

Open Access 1 Repo

TL;DR

RePO introduces a replay-enhanced method for policy optimization in reinforcement learning applied to large language models, significantly improving performance on mathematical reasoning tasks by utilizing diverse off-policy samples.

Contribution

This paper presents RePO, a novel replay-based policy optimization technique that enhances data efficiency and performance in RL for large language models, surpassing previous on-policy methods.

Findings

01

RePO achieves 18.4 and 4.1 point improvements on two models.

02

RePO increases computational cost by 15%.

03

Effective optimization steps increase by 48%.

Abstract

Reinforcement learning (RL) is vital for optimizing large language models (LLMs). Recent Group Relative Policy Optimization (GRPO) estimates advantages using multiple on-policy outputs per prompt, leading to high computational costs and low data efficiency. To address this, we introduce Replay-Enhanced Policy Optimization (RePO), which leverages diverse replay strategies to retrieve off-policy samples from a replay buffer, allowing policy optimization based on a broader and more diverse set of samples for each prompt. Experiments on five LLMs across seven mathematical reasoning benchmarks demonstrate that RePO achieves absolute average performance gains of $18.4$ and $4.1$ points for Qwen2.5-Math-1.5B and Qwen3-1.7B, respectively, compared to GRPO. Further analysis indicates that RePO increases computational cost by $15%$ while raising the number of effective optimization steps by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sihengli99/repo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training