Pandora: Towards General World Model with Natural Language Actions and Video States
Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha,, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing,, Zhiting Hu

TL;DR
Pandora is a hybrid model that generates video-based world states and enables real-time, natural language-controlled interactions, advancing the development of general, multi-domain world models.
Contribution
It introduces Pandora, a hybrid autoregressive-diffusion model that combines pretrained language and video models for controllable, domain-general world simulation.
Findings
Pandora achieves domain generality and video consistency.
It enables real-time control with free-text actions.
Extensive diverse outputs demonstrate its versatility.
Abstract
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Artificial Intelligence in Games · Multi-Agent Systems and Negotiation
