Maximum Entropy Hindsight Experience Replay
Douglas C. Crowder, Matthew L. Trappett, Darrien M. McKenzie, Frances, S. Chance

TL;DR
This paper enhances goal-based reinforcement learning by improving the application of Hindsight Experience Replay (HER) in on-policy algorithms like PPO, leading to more efficient learning in Predator-Prey environments.
Contribution
The paper introduces a principled method for selectively applying HER to on-policy algorithms, improving upon previous PPO-HER implementations.
Findings
Selective HER application accelerates learning.
Improved PPO-HER outperforms previous methods.
Enhanced goal-based RL efficiency.
Abstract
Hindsight experience replay (HER) is well-known to accelerate goal-based reinforcement learning (RL). While HER is generally applied to off-policy RL algorithms, we previously showed that HER can also accelerate on-policy algorithms, such as proximal policy optimization (PPO), for goal-based Predator-Prey environments. Here, we show that we can improve the previous PPO-HER algorithm by selectively applying HER in a principled manner.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Neural dynamics and brain function · Random lasers and scattering media
MethodsExperience Replay
