Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
Tung M. Luu, Chang D. Yoo

TL;DR
This paper introduces Hindsight Goal Ranking (HGR), a prioritized replay method that improves learning efficiency in sparse reward environments by focusing on experiences with higher TD errors, leading to faster training.
Contribution
HGR is a novel replay sampling method that prioritizes experiences based on TD error, enhancing learning speed over uniform sampling in robotic tasks.
Findings
HGR accelerates learning significantly faster than uniform sampling.
HGR is more sample-efficient than previous methods across multiple robotic tasks.
Empirical results demonstrate improved training speed and efficiency.
Abstract
This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the amount which the RL agent can learn from an experience. The actual sampling for large TD error is performed in two steps: first, an episode is sampled from the relay buffer according to the average TD error of its experiences, and then, for the sampled episode, the hindsight goal leading to larger TD error is sampled with higher probability from future visited states. The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
MethodsExperience Replay
