Variance Reduction based Experience Replay for Policy Optimization

Hua Zheng; Wei Xie; M. Ben Feng

arXiv:2208.12341·stat.ML·September 13, 2022·1 cites

Variance Reduction based Experience Replay for Policy Optimization

Hua Zheng, Wei Xie, M. Ben Feng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variance reduction experience replay framework that selectively reuses relevant past samples to improve policy gradient estimates, accelerating reinforcement learning in complex stochastic systems.

Contribution

The paper proposes a novel VRER method that adaptively prioritizes samples for more efficient policy optimization, outperforming uniform replay strategies.

Findings

01

VRER accelerates policy learning.

02

VRER improves policy performance.

03

Theoretical and empirical validation of VRER.

Abstract

For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenghuazx/vrer_policy_gradient
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mental Health Research Topics · Neural dynamics and brain function

MethodsExperience Replay