Hindsight Experience Replay Accelerates Proximal Policy Optimization
Douglas C. Crowder, Darrien M. McKenzie, Matthew L. Trappett, Frances, S. Chance

TL;DR
This paper demonstrates that integrating hindsight experience replay (HER) with proximal policy optimization (PPO) significantly speeds up learning in sparse reward environments, challenging the assumption that HER is only suitable for off-policy algorithms.
Contribution
The authors show that HER can be effectively combined with on-policy PPO, expanding the applicability of HER beyond off-policy methods.
Findings
HER accelerates PPO in sparse reward tasks
HER improves sample efficiency of PPO
HER combined with PPO outperforms baseline methods
Abstract
Hindsight experience replay (HER) accelerates off-policy reinforcement learning algorithms for environments that emit sparse rewards by modifying the goal of the episode post-hoc to be some state achieved during the episode. Because post-hoc modification of the observed goal violates the assumptions of on-policy algorithms, HER is not typically applied to on-policy algorithms. Here, we show that HER can dramatically accelerate proximal policy optimization (PPO), an on-policy reinforcement learning algorithm, when tested on a custom predator-prey environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics
MethodsExperience Replay
