Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion

Yang Yang; Tianyi Zhang; Wei Huang; Jinwei Chen; Boxi Wu; Xiaofei He; Deng Cai; Bo Li; Peng-Tao Jiang

arXiv:2603.13405·cs.CV·March 17, 2026

Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion

Yang Yang, Tianyi Zhang, Wei Huang, Jinwei Chen, Boxi Wu, Xiaofei He, Deng Cai, Bo Li, Peng-Tao Jiang

PDF

Open Access 1 Models

TL;DR

This paper introduces Anchor Forcing, a novel framework for interactive streaming video diffusion that enhances perceptual quality and motion consistency during prompt switching by using anchor caches and a tri-region RoPE approach.

Contribution

The paper proposes a cache-centric framework with anchor-guided re-cache and tri-region RoPE to address quality degradation and motion retention issues in streaming video diffusion.

Findings

01

Improves perceptual quality over prior streaming methods.

02

Enhances long-horizon motion retention in long videos.

03

Stabilizes quality during prompt switches.

Abstract

Interactive long video generation requires prompt switching to introduce new subjects or events, while maintaining perceptual fidelity and coherent motion over extended horizons. Recent distilled streaming video diffusion models reuse a rolling KV cache for long-range generation, enabling prompt-switch interaction through re-cache at each switch. However, existing streaming methods still exhibit progressive quality degradation and weakened motion dynamics. We identify two failure modes specific to interactive streaming generation: (i) at each prompt switch, current cache maintenance cannot simultaneously retain KV-based semantic context and recent latent cues, resulting in weak boundary conditioning and reduced perceptual quality; and (ii) during distillation, unbounded time indexing induces a positional distribution shift from the pretrained backbone's bounded RoPE regime, weakening…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
young98/AnchorForcing
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies