Variance Reduction Based Experience Replay for Policy Optimization

Hua Zheng; Wei Xie; M. Ben Feng; Keilung Choy

arXiv:2602.05379·stat.ML·February 6, 2026

Variance Reduction Based Experience Replay for Policy Optimization

Hua Zheng, Wei Xie, M. Ben Feng, Keilung Choy

PDF

Open Access

TL;DR

This paper introduces VRER, a variance reduction framework for experience replay in reinforcement learning, improving policy optimization efficiency by selectively reusing informative samples and providing theoretical convergence guarantees.

Contribution

We propose VRER, a novel, theoretically grounded experience replay method that reduces variance and accelerates policy learning in reinforcement learning algorithms.

Findings

01

VRER accelerates policy learning across various tasks.

02

It improves performance over existing policy optimization methods.

03

Theoretical analysis confirms convergence and bias-variance trade-off.

Abstract

Effective reinforcement learning (RL) for complex stochastic systems requires leveraging historical data collected in previous iterations to accelerate policy optimization. Classical experience replay treats all past observations uniformly and fails to account for their varying contributions to learning. To overcome this limitation, we propose Variance Reduction Experience Replay (VRER), a principled framework that selectively reuses informative samples to reduce variance in policy gradient estimation. VRER is algorithm-agnostic and integrates seamlessly with existing policy optimization methods, forming the basis of our sample-efficient off-policy algorithm, Policy Gradient with VRER (PG-VRER). Motivated by the lack of rigorous theoretical analysis of experience replay, we develop a novel framework that explicitly captures dependencies introduced by Markovian dynamics and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research