Large Video Planner Enables Generalizable Robot Control

Boyuan Chen; Tianyuan Zhang; Haoran Geng; Caiyi Zhang; Peihao Li; Kiwhan Song; William T. Freeman; Jitendra Malik; Pieter Abbeel; Russ Tedrake; Vincent Sitzmann; and Yilun Du

arXiv:2512.15840·cs.RO·May 11, 2026

Large Video Planner Enables Generalizable Robot Control

Boyuan Chen, Tianyuan Zhang, Haoran Geng, Caiyi Zhang, Peihao Li, Kiwhan Song, William T. Freeman, Jitendra Malik, Pieter Abbeel, Russ Tedrake, Vincent Sitzmann, and Yilun Du

PDF

2 Repos 1 Models

TL;DR

This paper introduces a large-scale video pretraining approach for robot control, enabling zero-shot planning and execution in diverse real-world tasks, with open-sourced models and datasets.

Contribution

It pioneers the use of large-scale video pretraining as the primary modality for robot foundation models, demonstrating strong generalization and real-world applicability.

Findings

01

Zero-shot video plans enable successful robot task execution.

02

The model generalizes across diverse scenes and tasks.

03

Open dataset and model support reproducibility and further research.

Abstract

General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models. Unlike static images and language, videos capture spatio-temporal sequences of states and actions in the physical world that are naturally aligned with robotic behavior. We curate an internet-scale video dataset of human activities and task demonstrations, and train, for the first time at a foundation-model scale,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
KempnerInstituteAI/LVP
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.