Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration
Alexandre Chenu, Olivier Serris, Olivier Sigaud, Nicolas, Perrin-Gilbert

TL;DR
This paper introduces DCIL-II, a novel algorithm that leverages sequential goals and a single demonstration to efficiently learn complex robotic control tasks, significantly reducing the need for multiple demonstrations.
Contribution
The paper presents DCIL-II, a new goal-conditioned reinforcement learning method that exploits sequentiality to learn complex tasks from a single demonstration with high sample efficiency.
Findings
Successfully applied to humanoid locomotion and stand-up tasks
Achieved unprecedented sample efficiency in simulated tasks
Enabled fast learning of complex robotic behaviors
Abstract
Deep Reinforcement Learning has been successfully applied to learn robotic control. However, the corresponding algorithms struggle when applied to problems where the agent is only rewarded after achieving a complex task. In this context, using demonstrations can significantly speed up the learning process, but demonstrations can be costly to acquire. In this paper, we propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. To do so, our method learns a goal-conditioned policy to control a system between successive low-dimensional goals. This sequential goal-reaching approach raises a problem of compatibility between successive goals: we need to ensure that the state resulting from reaching a goal is compatible with the achievement of the following goals. To tackle this problem, we present a new algorithm called DCIL-II. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
