MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens

Youngrae Kim; Qixin Hu; C.-C. Jay Kuo; Peter A. Beerel

arXiv:2603.12513·cs.CV·March 16, 2026

MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens

Youngrae Kim, Qixin Hu, C.-C. Jay Kuo, Peter A. Beerel

PDF

Open Access

TL;DR

MemRoPE introduces a training-free method for infinite video generation that maintains long-term and short-term memory streams to improve temporal coherence and fidelity over extended durations.

Contribution

It proposes a novel, training-free framework with dynamic memory tokens and positional indexing to enhance long-horizon video generation without fidelity loss.

Findings

01

Outperforms existing methods in temporal coherence

02

Improves visual fidelity in long-duration videos

03

Maintains subject consistency over extended periods

Abstract

Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Video Analysis and Summarization