PanoWan: Lifting Diffusion Video Generation Models to 360{\deg} with Latitude/Longitude-aware Mechanisms
Yifei Xia, Shuchen Weng, Siqi Yang, Jingqi Liu, Chengxuan Zhu, Minggui Teng, Zijian Jia, Han Jiang, Boxin Shi

TL;DR
PanoWan is a novel method that adapts pre-trained text-to-video models for high-quality 360-degree panoramic video generation by addressing spatial distortions and boundary issues, supported by a new panoramic video dataset.
Contribution
It introduces PanoWan, a lightweight framework with latitude-aware sampling and boundary handling mechanisms, enabling effective transfer of pre-trained models to panoramic video synthesis.
Findings
Achieves state-of-the-art panoramic video generation quality
Demonstrates robustness in zero-shot downstream tasks
Provides a new high-quality panoramic video dataset PanoVid
Abstract
Panoramic video generation enables immersive 360{\deg} content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
