Goal-conditioned Imitation Learning
Yiming Ding, Carlos Florensa, Mariano Phielipp, Pieter Abbeel

TL;DR
This paper introduces a goal-conditioned imitation learning approach that leverages demonstrations to efficiently train policies capable of reaching diverse goals in robotics, reducing sample complexity and handling demonstrations without action data.
Contribution
The work presents a novel method integrating demonstrations into goal-conditioned RL, improving convergence speed and performance, even with actionless trajectories.
Findings
Speeds up policy learning compared to traditional HER.
Effective with demonstrations lacking action information.
Surpasses prior imitation learning methods in goal-reaching tasks.
Abstract
Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Robot Manipulation and Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay
