Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

Mingqiang Wu; Weilun Feng; Zhefeng Zhang; Haotong Qin; Yuqi Li; Guoxin Fan; Xiaokun Liu; Zhulin An; Libo Huang; Yongjun Xu; Chuanguang Yang

arXiv:2605.16003·cs.CV·May 18, 2026

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

Mingqiang Wu, Weilun Feng, Zhefeng Zhang, Haotong Qin, Yuqi Li, Guoxin Fan, Xiaokun Liu, Zhulin An, Libo Huang, Yongjun Xu, Chuanguang Yang

PDF

1 Repo

TL;DR

Echo-Forcing introduces a scene memory framework for interactive long video generation, enabling smooth scene transitions, long-term recall, and prompt responsiveness without additional training.

Contribution

It proposes a training-free, hierarchical scene memory framework with novel mechanisms to improve interactive long video generation.

Findings

01

Achieves superior performance on VBench-Long in long-video and interactive scenarios.

02

Supports smooth transitions, hard cuts, and long-range scene recall within a bounded cache.

03

Demonstrates effectiveness through extensive evaluations.

Abstract

Autoregressive video diffusion models enable open-ended generation through local attention and KV caching. However, existing training-free long-video optimization methods mainly focus on stable extension under a single prompt, making them difficult to handle interactive scenarios involving prompt switching, old scene forgetting, and historical scene recall. We identify the core bottleneck as the functional entanglement of historical KV states: stable anchors and recent dynamics are handled by the same cache policy, leading to outdated background contamination, delayed response to new prompts, and loss of long-range memory. To address this issue, we propose Echo-Forcing, a training-free scene memory framework specifically designed for interactive long video generation with three core mechanisms: (1) Hierarchical Temporal Memory, which decouples stable anchors, compressed history, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingqiangWu/Echo-Forcing
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.