Endless World: Real-Time 3D-Aware Long Video Generation

Ke Zhang; Yiqun Mei; Jiacong Xu; Vishal M. Patel

arXiv:2512.12430·cs.CV·December 16, 2025

Endless World: Real-Time 3D-Aware Long Video Generation

Ke Zhang, Yiqun Mei, Jiacong Xu, Vishal M. Patel

PDF

Open Access

TL;DR

Endless World is a real-time framework for generating infinite, 3D-consistent videos by combining autoregressive training, global 3D-aware attention, and a 3D injection mechanism, enabling stable long-horizon video synthesis.

Contribution

The paper introduces a novel real-time 3D-aware video generation method that supports infinite sequences with long-range coherence and geometric consistency.

Findings

01

Produces long, stable, and coherent videos

02

Achieves real-time inference on a single GPU

03

Outperforms existing methods in visual fidelity and spatial consistency

Abstract

Producing long, coherent video sequences with stable 3D structure remains a major challenge, particularly in streaming scenarios. Motivated by this, we introduce Endless World, a real-time framework for infinite, 3D-consistent video generation.To support infinite video generation, we introduce a conditional autoregressive training strategy that aligns newly generated content with existing video frames. This design preserves long-range dependencies while remaining computationally efficient, enabling real-time inference on a single GPU without additional training overhead.Moreover, our Endless World integrates global 3D-aware attention to provide continuous geometric guidance across time. Our 3D injection mechanism enforces physical plausibility and geometric consistency throughout extended sequences, addressing key challenges in long-horizon and dynamic scene synthesis.Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging