ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay
Sameera Lanka, Tianfu Wu

TL;DR
ARCHER enhances hindsight experience replay by using more aggressive rewards to counteract bias, leading to improved sample efficiency in deep reinforcement learning for robotic manipulation tasks.
Contribution
The paper introduces ARCHER, a novel method that extends HER with aggressive rewards to mitigate bias and improve sample efficiency in RL.
Findings
ARCHER outperforms standard HER in sample efficiency.
Aggressive rewards effectively counteract bias in HER.
Results are validated on DeepMind Control Suite environments.
Abstract
Experience replay is an important technique for addressing sample-inefficiency in deep reinforcement learning (RL), but faces difficulty in learning from binary and sparse rewards due to disproportionately few successful experiences in the replay buffer. Hindsight experience replay (HER) was recently proposed to tackle this difficulty by manipulating unsuccessful transitions, but in doing so, HER introduces a significant bias in the replay buffer experiences and therefore achieves a suboptimal improvement in sample-efficiency. In this paper, we present an analysis on the source of bias in HER, and propose a simple and effective method to counter the bias, to most effectively harness the sample-efficiency provided by HER. Our method, motivated by counter-factual reasoning and called ARCHER, extends HER with a trade-off to make rewards calculated for hindsight experiences numerically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Mind wandering and attention
MethodsExperience Replay
