VIEW: Visual Imitation Learning with Waypoints

Ananth Jonnavittula; Sagar Parekh; Dylan P. Losey

arXiv:2404.17906·cs.RO·January 22, 2025·2 cites

VIEW: Visual Imitation Learning with Waypoints

Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey

PDF

Open Access

TL;DR

VIEW is a novel visual imitation learning algorithm that improves sample efficiency and enables robots to learn manipulation tasks from long videos, including human demonstrations, with minimal real-world interactions.

Contribution

The paper introduces VIEW, a new VIL method that extracts key trajectories, uses an agent-agnostic reward, and segments tasks to enhance learning efficiency from videos.

Findings

01

VIEW outperforms existing VIL methods in simulations and real-world tests.

02

Robots can learn complex manipulation tasks from long videos quickly.

03

Effective learning from a single demonstration in under 30 minutes.

Abstract

Robots can use Visual Imitation Learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator's intent, employing an agent-agnostic reward function for feedback on the robot's actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications