VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey

TL;DR
VIEW is a novel visual imitation learning algorithm that improves sample efficiency and enables robots to learn manipulation tasks from long videos, including human demonstrations, with minimal real-world interactions.
Contribution
The paper introduces VIEW, a new VIL method that extracts key trajectories, uses an agent-agnostic reward, and segments tasks to enhance learning efficiency from videos.
Findings
VIEW outperforms existing VIL methods in simulations and real-world tests.
Robots can learn complex manipulation tasks from long videos quickly.
Effective learning from a single demonstration in under 30 minutes.
Abstract
Robots can use Visual Imitation Learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator's intent, employing an agent-agnostic reward function for feedback on the robot's actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
