ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience   Replay

Sameera Lanka; Tianfu Wu

arXiv:1809.02070·cs.LG·September 10, 2018·21 cites

ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay

Sameera Lanka, Tianfu Wu

PDF

Open Access 1 Repo

TL;DR

ARCHER enhances hindsight experience replay by using more aggressive rewards to counteract bias, leading to improved sample efficiency in deep reinforcement learning for robotic manipulation tasks.

Contribution

The paper introduces ARCHER, a novel method that extends HER with aggressive rewards to mitigate bias and improve sample efficiency in RL.

Findings

01

ARCHER outperforms standard HER in sample efficiency.

02

Aggressive rewards effectively counteract bias in HER.

03

Results are validated on DeepMind Control Suite environments.

Abstract

Experience replay is an important technique for addressing sample-inefficiency in deep reinforcement learning (RL), but faces difficulty in learning from binary and sparse rewards due to disproportionately few successful experiences in the replay buffer. Hindsight experience replay (HER) was recently proposed to tackle this difficulty by manipulating unsuccessful transitions, but in doing so, HER introduces a significant bias in the replay buffer experiences and therefore achieves a suboptimal improvement in sample-efficiency. In this paper, we present an analysis on the source of bias in HER, and propose a simple and effective method to counter the bias, to most effectively harness the sample-efficiency provided by HER. Our method, motivated by counter-factual reasoning and called ARCHER, extends HER with a trade-off to make rewards calculated for hindsight experiences numerically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Baichenjia/BHER
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Mind wandering and attention

MethodsExperience Replay