Endless World: Real-Time 3D-Aware Long Video Generation
Ke Zhang, Yiqun Mei, Jiacong Xu, Vishal M. Patel

TL;DR
Endless World is a real-time framework for generating infinite, 3D-consistent videos by combining autoregressive training, global 3D-aware attention, and a 3D injection mechanism, enabling stable long-horizon video synthesis.
Contribution
The paper introduces a novel real-time 3D-aware video generation method that supports infinite sequences with long-range coherence and geometric consistency.
Findings
Produces long, stable, and coherent videos
Achieves real-time inference on a single GPU
Outperforms existing methods in visual fidelity and spatial consistency
Abstract
Producing long, coherent video sequences with stable 3D structure remains a major challenge, particularly in streaming scenarios. Motivated by this, we introduce Endless World, a real-time framework for infinite, 3D-consistent video generation.To support infinite video generation, we introduce a conditional autoregressive training strategy that aligns newly generated content with existing video frames. This design preserves long-range dependencies while remaining computationally efficient, enabling real-time inference on a single GPU without additional training overhead.Moreover, our Endless World integrates global 3D-aware attention to provide continuous geometric guidance across time. Our 3D injection mechanism enforces physical plausibility and geometric consistency throughout extended sequences, addressing key challenges in long-horizon and dynamic scene synthesis.Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
