ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation
Haonan Wang, Hanyu Zhou, Tao Gu, Luxin Yan

TL;DR
ST-Gen4D introduces a novel 4D spatiotemporal cognition-based world model that enhances 4D generation by capturing local dynamics and global appearance, outperforming existing methods.
Contribution
The paper presents a new framework integrating 4D cognition with generative priors, enabling structurally rational 4D generation with topological consistency.
Findings
Outperforms existing 4D generation methods in experiments.
Guarantees structural rationality and topological consistency.
Introduces ST-4D datasets for benchmarking.
Abstract
Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance overall spatiotemporal consistency. However, these methods only ensure global appearance coherence and fail to reveal the local dynamics of the physical world. Our insight is that global appearance structure and local dynamic topology empower 4D spatiotemporal cognition, thereby enabling 4D generation with spatiotemporal regularities. In this work, we propose ST-Gen4D, a 4D generation framework with 4D spatiotemporal cognition-based world model. Our model is guided by four key designs: 1) Spatiotemporal representation. We encode various modalities into multiple representations as a feature basis. 2) Spatiotemporal cognition. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
