Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

Yanjun Guo; Zhengqiang Zhang; Pengfei Wang; Xinyue Liang; Zhiyuan Ma; Lei Zhang

arXiv:2604.18215·cs.CV·April 22, 2026

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

Yanjun Guo, Zhengqiang Zhang, Pengfei Wang, Xinyue Liang, Zhiyuan Ma, Lei Zhang

PDF

TL;DR

This paper introduces a decoupled memory control framework for long-horizon video generation that improves spatial consistency and efficiency by separating memory from the generative process.

Contribution

It proposes a lightweight, independent memory module with cross-attention and camera-aware gating to enhance spatial consistency and reduce training costs.

Findings

01

Achieves state-of-the-art spatial consistency in generated videos.

02

Reduces training costs compared to entangled memory models.

03

Enhances the ability to explore novel scene regions.

Abstract

Spatially consistent long-horizon video generation aims to maintain temporal and spatial consistency along predefined camera trajectories. Existing methods mostly entangle memory modeling with video generation, leading to inconsistent content during scene revisits and diminished generative capacity when exploring novel regions, even trained on extensive annotated data. To address these limitations, we propose a decoupled framework that separates memory conditioning from generation. Our approach significantly reduces training costs while simultaneously enhancing spatial consistency and preserving the generative capacity for novel scene exploration. Specifically, we employ a lightweight, independent memory branch to learn precise spatial consistency from historical observation. We first introduce a hybrid memory representation to capture complementary temporal and spatial cues from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.