Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang, Xiaoyu Chen, Yen-Jen Wang,, Jianke Zhang, Koushil Sreenath, Chaochao Lu, Jianyu Chen

TL;DR
This paper introduces Video Prediction Policy (VPP), a robot control method leveraging the predictive capabilities of video diffusion models to improve generalization and success in complex manipulation tasks.
Contribution
The paper proposes VPP, a novel approach that uses pre-trained video diffusion models to incorporate future dynamics into robot policies, enhancing generalization and performance.
Findings
VPP improves generalization by 18.6% on the Calvin ABC-D benchmark.
VPP achieves a 31.6% increase in success rates on real-world dexterous tasks.
Fine-tuning video foundation models enhances future prediction accuracy for robotic control.
Abstract
Visual representations play a crucial role in developing generalist robotic policies. Previous vision encoders, typically pre-trained with single-image reconstruction or two-image contrastive learning, tend to capture static information, often neglecting the dynamic aspects vital for embodied tasks. Recently, video diffusion models (VDMs) demonstrate the ability to predict future frames and showcase a strong understanding of physical world. We hypothesize that VDMs inherently produce visual representations that encompass both current static information and predicted future dynamics, thereby providing valuable guidance for robot action learning. Based on this hypothesis, we propose the Video Prediction Policy (VPP), which learns implicit inverse dynamics model conditioned on predicted future representations inside VDMs. To predict more precise future, we fine-tune pre-trained video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion · Contrastive Learning
