On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time Convergence

Hua Zheng; Wei Xie; M. Ben Feng

arXiv:2110.08902·cs.LG·February 4, 2026·1 cites

On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time Convergence

Hua Zheng, Wei Xie, M. Ben Feng

PDF

Open Access 1 Repo

TL;DR

This paper provides a theoretical analysis of experience replay in policy gradient methods, revealing how bias and variance depend on buffer size, data freshness, and dynamics, and establishing finite-time convergence guarantees.

Contribution

It introduces a novel framework and proof technique to analyze dependencies in experience replay, deriving bounds on bias, variance, and convergence in policy optimization.

Findings

01

Bias increases with stale data and policy updates.

02

Variance decreases with larger buffers and less correlated samples.

03

Finite-time convergence depends on buffer size, mixing time, and sample correlation.

Abstract

Experience replay is a core ingredient of modern deep reinforcement learning, yet its benefits in policy optimization are poorly understood beyond empirical heuristics. This paper develops a novel theoretical framework for experience replay in modern policy gradient methods, where two sources of dependence fundamentally complicate analysis: Markovian correlations along trajectories and policy drift across optimization iterations. We introduce a new proof technique based on auxiliary Markov chains and lag-based decoupling that makes these dependencies tractable. Within this framework, we derive finite-time bias bounds for policy-gradient estimators under replay, identifying how bias scales with the cumulative policy update, the mixing time of the underlying dynamics, and the age of buffered data, thereby formalizing the practitioner's rule of avoiding overly stale replay. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenghuazx/vrer_policy_gradient
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · Mind wandering and attention

MethodsExperience Replay