I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

Jia Li; Han Yan; Yihang Chen; Siqi Li; Xibin Song; Yifu Wang; Jianfei Cai; Tien-Tsin Wong; Pan Ji

arXiv:2603.23413·cs.CV·March 25, 2026

I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

Jia Li, Han Yan, Yihang Chen, Siqi Li, Xibin Song, Yifu Wang, Jianfei Cai, Tien-Tsin Wong, Pan Ji

PDF

Open Access

TL;DR

I3DM introduces an implicit 3D-aware memory mechanism for consistent video scene generation that avoids explicit 3D reconstruction and improves revisit consistency, fidelity, and camera control.

Contribution

The paper proposes a novel implicit 3D-aware memory retrieval and injection method that enhances long-term scene consistency without explicit 3D modeling.

Findings

01

Outperforms state-of-the-art methods in revisit consistency

02

Achieves higher generation fidelity

03

Provides more accurate camera control

Abstract

Despite remarkable progress in video generation, maintaining long-term scene consistency upon revisiting previously explored areas remains challenging. Existing solutions rely either on explicitly constructing 3D geometry, which suffers from error accumulation and scale ambiguity, or on naive camera Field-of-View (FoV) retrieval, which typically fails under complex occlusions. To overcome these limitations, we propose I3DM, a novel implicit 3D-aware memory mechanism for consistent video scene generation that bypasses explicit 3D reconstruction. At the core of our approach is a 3D-aware memory retrieval strategy, which leverages the intermediate features of a pre-trained Feed-Forward Novel View Synthesis (FF-NVS) model to score view relevance, enabling robust retrieval even in highly occluded scenarios. Furthermore, to fully utilize the retrieved historical frames, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis