CubeComposer: Spatio-Temporal Autoregressive 4K 360{\deg} Video Generation from Perspective Video
Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan

TL;DR
CubeComposer is a novel autoregressive diffusion model that directly generates high-resolution 4K 360-degree videos in a cubemap format, overcoming previous resolution limitations and enhancing VR content quality.
Contribution
It introduces a spatio-temporal autoregressive strategy, a cube face context management mechanism, and continuity-aware techniques for high-quality 4K 360{ extdegree} video synthesis.
Findings
Outperforms state-of-the-art methods in native resolution quality
Supports practical VR applications with high-resolution output
Efficiently manages memory and coherence in 4K video generation
Abstract
Generating high-quality 360{\deg} panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360{\deg} videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Image Processing Techniques
