Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier, Pietquin, Bilal Piot, Nicolas Heess, Thomas Roth\"orl, Thomas Lampe, Martin, Riedmiller

TL;DR
This paper introduces a model-free reinforcement learning method that leverages demonstrations to improve learning efficiency in robotic tasks with sparse rewards, eliminating the need for engineered shaping rewards.
Contribution
The authors propose a demonstration-augmented DDPG algorithm with automatic replay sampling ratio tuning, reducing reliance on reward engineering in robotic RL tasks.
Findings
Demonstration-based DDPG outperforms standard DDPG in simulated insertion tasks.
The method successfully applies to real robotic insertion of a flexible clip.
Demonstrations replace the need for carefully designed reward functions.
Abstract
We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient
