Spatia: Video Generation with Updatable Spatial Memory

Jinjing Zhao; Fangyun Wei; Zhening Liu; Hongyang Zhang; Chang Xu; Yan Lu

arXiv:2512.15716·cs.CV·December 18, 2025

Spatia: Video Generation with Updatable Spatial Memory

Jinjing Zhao, Fangyun Wei, Zhening Liu, Hongyang Zhang, Chang Xu, Yan Lu

PDF

Open Access 1 Models

TL;DR

Spatia introduces a novel video generation framework that uses a persistent 3D scene memory and visual SLAM to improve spatial consistency and enable interactive editing, addressing limitations of existing models.

Contribution

It proposes a spatial memory-aware framework that explicitly maintains a 3D scene point cloud for scalable, consistent, and editable video generation.

Findings

01

Enhanced spatial consistency in generated videos.

02

Ability to perform explicit camera control and 3D-aware editing.

03

Effective integration of visual SLAM for memory updates.

Abstract

Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Jinjing713/Spatia
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis