Maximum Entropy Hindsight Experience Replay

Douglas C. Crowder; Matthew L. Trappett; Darrien M. McKenzie; Frances; S. Chance

arXiv:2410.24016·cs.LG·November 1, 2024

Maximum Entropy Hindsight Experience Replay

Douglas C. Crowder, Matthew L. Trappett, Darrien M. McKenzie, Frances, S. Chance

PDF

Open Access

TL;DR

This paper enhances goal-based reinforcement learning by improving the application of Hindsight Experience Replay (HER) in on-policy algorithms like PPO, leading to more efficient learning in Predator-Prey environments.

Contribution

The paper introduces a principled method for selectively applying HER to on-policy algorithms, improving upon previous PPO-HER implementations.

Findings

01

Selective HER application accelerates learning.

02

Improved PPO-HER outperforms previous methods.

03

Enhanced goal-based RL efficiency.

Abstract

Hindsight experience replay (HER) is well-known to accelerate goal-based reinforcement learning (RL). While HER is generally applied to off-policy RL algorithms, we previously showed that HER can also accelerate on-policy algorithms, such as proximal policy optimization (PPO), for goal-based Predator-Prey environments. Here, we show that we can improve the previous PPO-HER algorithm by selectively applying HER in a principled manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Neural dynamics and brain function · Random lasers and scattering media

MethodsExperience Replay