Video as the New Language for Real-World Decision Making
Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce,, Andre Barreto, Pieter Abbeel, Dale Schuurmans

TL;DR
This paper explores the potential of video as a universal interface for AI, capable of representing complex real-world information and performing tasks like planning and simulation, similar to language models.
Contribution
It highlights the untapped potential of video generation for real-world decision making and discusses how it can serve as a versatile tool for various AI applications.
Findings
Video can serve as a unified interface for diverse AI tasks.
Video generation techniques can enable planning, reasoning, and simulation.
Challenges in video generation need to be addressed to unlock its full potential.
Abstract
Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
