CubeComposer: Spatio-Temporal Autoregressive 4K 360{\deg} Video Generation from Perspective Video

Lingen Li; Guangzhi Wang; Xiaoyu Li; Zhaoyang Zhang; Qi Dou; Jinwei Gu; Tianfan Xue; Ying Shan

arXiv:2603.04291·cs.CV·March 5, 2026

CubeComposer: Spatio-Temporal Autoregressive 4K 360{\deg} Video Generation from Perspective Video

Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan

PDF

Open Access 1 Models 1 Datasets

TL;DR

CubeComposer is a novel autoregressive diffusion model that directly generates high-resolution 4K 360-degree videos in a cubemap format, overcoming previous resolution limitations and enhancing VR content quality.

Contribution

It introduces a spatio-temporal autoregressive strategy, a cube face context management mechanism, and continuity-aware techniques for high-quality 4K 360{ extdegree} video synthesis.

Findings

01

Outperforms state-of-the-art methods in native resolution quality

02

Supports practical VR applications with high-resolution output

03

Efficiently manages memory and coherence in 4K video generation

Abstract

Generating high-quality 360{\deg} panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting $\leq$ 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360{\deg} videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
TencentARC/CubeComposer
model· 181 dl· ♡ 16
181 dl♡ 16

Datasets

l-li/4K360Vid
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Image Processing Techniques