Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt

Xiang Zhu; Yichen Liu; Hezhong Li; Jianyu Chen

arXiv:2505.20795·cs.RO·May 28, 2025

Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt

Xiang Zhu, Yichen Liu, Hezhong Li, Jianyu Chen

PDF

Open Access

TL;DR

This paper introduces a two-stage framework enabling robots to learn new tasks directly from human demonstration videos, eliminating the need for additional data collection or model fine-tuning.

Contribution

The novel approach uses human demonstration videos as prompts for robot policy learning, combining cross-prediction and contrastive loss to enhance generalization without extra data.

Findings

01

Effective in real-world dexterous manipulation tasks

02

Achieves generalization without additional data or fine-tuning

03

Outperforms baseline methods in task execution

Abstract

Recent robot learning methods commonly rely on imitation learning from massive robotic dataset collected with teleoperation. When facing a new task, such methods generally require collecting a set of new teleoperation data and finetuning the policy. Furthermore, the teleoperation data collection pipeline is also tedious and expensive. Instead, human is able to efficiently learn new tasks by just watching others do. In this paper, we introduce a novel two-stage framework that utilizes human demonstrations to learn a generalizable robot policy. Such policy can directly take human demonstration video as a prompt and perform new tasks without any new teleoperation data and model finetuning at all. In the first stage, we train video generation model that captures a joint representation for both the human and robot demonstration video data using cross-prediction. In the second stage, we fuse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics