Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Hidir Yesiltepe; Tuna Han Salih Meral; Adil Kaan Akan; Kaan Oktay; Pinar Yanardag

arXiv:2511.20649·cs.CV·March 20, 2026

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Pinar Yanardag

PDF

Open Access

TL;DR

The paper introduces $ abla$-RoPE, a training-free framework that enables infinite, controllable, and cinematic video generation by overcoming key limitations of existing autoregressive models through innovative temporal encoding and cache management techniques.

Contribution

It presents $ abla$-RoPE, a novel inference-time framework that allows for unlimited, fine-grained, and discontinuous video generation without retraining, addressing core bottlenecks in autoregressive video diffusion models.

Findings

01

Surpasses previous models in VBench scores.

02

Enables continuous video beyond base model limits.

03

Supports scene transitions within a single generation stream.

Abstract

Current autoregressive video diffusion models are constrained by three core bottlenecks: (i) the finite temporal horizon imposed by the base model's 3D Rotary Positional Embedding (3D-RoPE), (ii) slow prompt responsiveness in maintaining fine-grained action control during long-form rollouts, and (iii) the inability to realize discontinuous cinematic transitions within a single generation stream. We introduce $\infty$ -RoPE, a unified inference-time framework that addresses all three limitations through three interconnected components: Block-Relativistic RoPE, KV Flush, and RoPE Cut. Block-Relativistic RoPE reformulates temporal encoding as a moving local reference frame, where each newly generated latent block is rotated relative to the base model's maximum frame horizon while earlier blocks are rotated backward to preserve relative temporal geometry. This relativistic formulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Human Motion and Animation