MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
Youngrae Kim, Qixin Hu, C.-C. Jay Kuo, Peter A. Beerel

TL;DR
MemRoPE introduces a training-free method for infinite video generation that maintains long-term and short-term memory streams to improve temporal coherence and fidelity over extended durations.
Contribution
It proposes a novel, training-free framework with dynamic memory tokens and positional indexing to enhance long-horizon video generation without fidelity loss.
Findings
Outperforms existing methods in temporal coherence
Improves visual fidelity in long-duration videos
Maintains subject consistency over extended periods
Abstract
Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Video Analysis and Summarization
