One-shot Visual Imitation via Attributed Waypoints and Demonstration Augmentation
Matthew Chang, Saurabh Gupta

TL;DR
This paper introduces a modular approach for one-shot visual imitation that separates task inference from execution, using data augmentation to improve generalization, achieving significant success rate improvements on benchmarks.
Contribution
The paper proposes a novel modular framework that separates task inference from execution and employs data augmentation to enhance one-shot visual imitation performance.
Findings
Achieved 100% success on one benchmark and 48% on another.
Improved state-of-the-art success rates by 90% and 20%.
Identified key errors in existing methods and addressed them effectively.
Abstract
In this paper, we analyze the behavior of existing techniques and design new solutions for the problem of one-shot visual imitation. In this setting, an agent must solve a novel instance of a novel task given just a single visual demonstration. Our analysis reveals that current methods fall short because of three errors: the DAgger problem arising from purely offline training, last centimeter errors in interacting with objects, and mis-fitting to the task context rather than to the actual task. This motivates the design of our modular approach where we a) separate out task inference (what to do) from task execution (how to do it), and b) develop data augmentation and generation techniques to mitigate mis-fitting. The former allows us to leverage hand-crafted motor primitives for task execution which side-steps the DAgger problem and last centimeter errors, while the latter gets the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robot Manipulation and Learning
