Leveraging Demonstrations for Deep Reinforcement Learning on Robotics   Problems with Sparse Rewards

Mel Vecerik; Todd Hester; Jonathan Scholz; Fumin Wang; Olivier; Pietquin; Bilal Piot; Nicolas Heess; Thomas Roth\"orl; Thomas Lampe; Martin; Riedmiller

arXiv:1707.08817·cs.AI·October 9, 2018·510 cites

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier, Pietquin, Bilal Piot, Nicolas Heess, Thomas Roth\"orl, Thomas Lampe, Martin, Riedmiller

PDF

Open Access 4 Repos

TL;DR

This paper introduces a model-free reinforcement learning method that leverages demonstrations to improve learning efficiency in robotic tasks with sparse rewards, eliminating the need for engineered shaping rewards.

Contribution

The authors propose a demonstration-augmented DDPG algorithm with automatic replay sampling ratio tuning, reducing reliance on reward engineering in robotic RL tasks.

Findings

01

Demonstration-based DDPG outperforms standard DDPG in simulated insertion tasks.

02

The method successfully applies to real robotic insertion of a flexible clip.

03

Demonstrations replace the need for carefully designed reward functions.

Abstract

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning

MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient