Overcoming Exploration in Reinforcement Learning with Demonstrations

Ashvin Nair; Bob McGrew; Marcin Andrychowicz; Wojciech Zaremba; Pieter; Abbeel

arXiv:1709.10089·cs.LG·February 27, 2018

Overcoming Exploration in Reinforcement Learning with Demonstrations

Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter, Abbeel

PDF

3 Repos

TL;DR

This paper introduces a reinforcement learning method that leverages demonstrations to efficiently learn complex, long-horizon robotics tasks with sparse rewards, significantly outperforming traditional RL and behavior cloning.

Contribution

It presents a simple, effective approach combining DDPG and HER with demonstrations, enabling learning of tasks previously unsolvable by RL or imitation alone.

Findings

01

Achieves an order of magnitude speedup over standard RL methods.

02

Successfully learns long-horizon, multi-step robotics tasks with sparse rewards.

03

Outperforms both RL and behavior cloning on complex tasks.

Abstract

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay