Hindsight Experience Replay Accelerates Proximal Policy Optimization

Douglas C. Crowder; Darrien M. McKenzie; Matthew L. Trappett; Frances; S. Chance

arXiv:2410.22524·cs.LG·October 31, 2024

Hindsight Experience Replay Accelerates Proximal Policy Optimization

Douglas C. Crowder, Darrien M. McKenzie, Matthew L. Trappett, Frances, S. Chance

PDF

Open Access

TL;DR

This paper demonstrates that integrating hindsight experience replay (HER) with proximal policy optimization (PPO) significantly speeds up learning in sparse reward environments, challenging the assumption that HER is only suitable for off-policy algorithms.

Contribution

The authors show that HER can be effectively combined with on-policy PPO, expanding the applicability of HER beyond off-policy methods.

Findings

01

HER accelerates PPO in sparse reward tasks

02

HER improves sample efficiency of PPO

03

HER combined with PPO outperforms baseline methods

Abstract

Hindsight experience replay (HER) accelerates off-policy reinforcement learning algorithms for environments that emit sparse rewards by modifying the goal of the episode post-hoc to be some state achieved during the episode. Because post-hoc modification of the observed goal violates the assumptions of on-policy algorithms, HER is not typically applied to on-policy algorithms. Here, we show that HER can dramatically accelerate proximal policy optimization (PPO), an on-policy reinforcement learning algorithm, when tested on a custom predator-prey environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics

MethodsExperience Replay