DreamWorld: Unified World Modeling in Video Generation

Boming Tan; Xiangdong Zhang; Ning Liao; Yuqing Zhang; Shaofeng Zhang; Xue Yang; Qi Fan; Yanyong Zhang

arXiv:2603.00466·cs.CV·March 3, 2026

DreamWorld: Unified World Modeling in Video Generation

Boming Tan, Xiangdong Zhang, Ning Liao, Yuqing Zhang, Shaofeng Zhang, Xue Yang, Qi Fan, Yanyong Zhang

PDF

Open Access

TL;DR

DreamWorld introduces a unified framework for video generation that jointly models multiple aspects of the world, such as physical and semantic knowledge, to produce more coherent and consistent videos.

Contribution

It proposes a novel joint world modeling paradigm with techniques like CCA and multi-source guidance to improve video consistency and realism.

Findings

01

Outperforms previous models by 2.26 points on VBench.

02

Enhances world consistency in generated videos.

03

Addresses visual instability and flickering issues.

Abstract

Despite impressive progress in video generation, existing models remain limited to surface-level plausibility, lacking a coherent and unified understanding of the world. Prior approaches typically incorporate only a single form of world-related knowledge or rely on rigid alignment strategies to introduce additional knowledge. However, aligning the single world knowledge is insufficient to constitute a world model that requires jointly modeling multiple heterogeneous dimensions (e.g., physical commonsense, 3D and temporal consistency). To address this limitation, we introduce \textbf{DreamWorld}, a unified framework that integrates complementary world knowledge into video generators via a \textbf{Joint World Modeling Paradigm}, jointly predicting video pixels and features from foundation models to capture temporal dynamics, spatial geometry, and semantic consistency. However, naively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis