Learning Video-Conditioned Policies for Unseen Manipulation Tasks

Elliot Chane-Sane; Cordelia Schmid; Ivan Laptev

arXiv:2305.06289·cs.RO·May 11, 2023·1 cites

Learning Video-Conditioned Policies for Unseen Manipulation Tasks

Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev

PDF

Open Access

TL;DR

This paper introduces ViP, a video-conditioned policy learning method enabling robots to perform unseen manipulation tasks from human demonstration videos in a zero-shot setting, without task-specific training.

Contribution

The paper presents a novel zero-shot learning approach that maps human demonstration videos to robot actions using pre-trained video embeddings, avoiding task-specific training data.

Findings

01

Outperforms state-of-the-art in multi-task manipulation environments

02

Enables zero-shot robot control from human videos

03

Effective generalization to unseen tasks

Abstract

The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target task. While prior work typically aims to imitate human demonstrations performed in robot environments, here we focus on a more realistic and challenging setup with demonstrations recorded in natural and diverse human environments. We propose Video-conditioned Policy learning (ViP), a data-driven approach that maps human demonstrations of previously unseen tasks to robot manipulation skills. To this end, we learn our policy to generate appropriate actions given current scene observations and a video of the target task. To encourage generalization to new tasks, we avoid particular tasks during training and learn our policy from unlabelled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsTest