Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
Weirui Ye, Fangchen Liu, Zheng Ding, Yang Gao, Oleh Rybkin, Pieter, Abbeel

TL;DR
Video2Policy introduces a framework that leverages internet videos to generate diverse simulation tasks, enabling scalable training of generalist policies for robotics, including complex tasks, with successful transfer to real robots.
Contribution
The paper presents a novel method to reconstruct simulation tasks from internet videos and train policies using in-context LLM-generated rewards, addressing scalability and realism issues.
Findings
Reconstructed over 100 diverse human behavior videos into simulation tasks.
Successfully trained RL policies on complex tasks like throwing.
Demonstrated transfer of policies from simulation to real robots.
Abstract
Simulation offers a promising approach for cheaply scaling training data for generalist policies. To scalably generate data from diverse and realistic tasks, existing algorithms either rely on large language models (LLMs) that may hallucinate tasks not interesting for robotics; or digital twins, which require careful real-to-sim alignment and are hard to scale. To address these challenges, we introduce Video2Policy, a novel framework that leverages internet RGB videos to reconstruct tasks based on everyday human behavior. Our approach comprises two phases: (1) task generation in simulation from videos; and (2) reinforcement learning utilizing in-context LLM-generated reward functions iteratively. We demonstrate the efficacy of Video2Policy by reconstructing over 100 videos from the Something-Something-v2 (SSv2) dataset, which depicts diverse and complex human behaviors on 9 different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
