One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks
Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn

TL;DR
This paper introduces a method for learning complex multi-stage vision-based tasks on robots from a single human demonstration video, by learning primitive behaviors and their composition, enabling generalization to new objects and tasks.
Contribution
It presents a novel approach that learns primitive behaviors from videos and composes them dynamically for multi-stage tasks, reducing data requirements and improving generalization.
Findings
Successfully learned multi-stage tasks on real robots
Demonstrated generalization to novel objects and environments
Achieved effective task execution from minimal demonstrations
Abstract
We consider the problem of learning multi-stage vision-based tasks on a real robot from a single video of a human performing the task, while leveraging demonstration data of subtasks with other objects. This problem presents a number of major challenges. Video demonstrations without teleoperation are easy for humans to provide, but do not provide any direct supervision. Learning policies from raw pixels enables full generality but calls for large function approximators with many parameters to be learned. Finally, compound tasks can require impractical amounts of demonstration data, when treated as a monolithic skill. To address these challenges, we propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by "watching" a human demonstrator. Our results on a simulated Sawyer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
