PanoWan: Lifting Diffusion Video Generation Models to 360{\deg} with Latitude/Longitude-aware Mechanisms

Yifei Xia; Shuchen Weng; Siqi Yang; Jingqi Liu; Chengxuan Zhu; Minggui Teng; Zijian Jia; Han Jiang; Boxin Shi

arXiv:2505.22016·cs.CV·June 26, 2025

PanoWan: Lifting Diffusion Video Generation Models to 360{\deg} with Latitude/Longitude-aware Mechanisms

Yifei Xia, Shuchen Weng, Siqi Yang, Jingqi Liu, Chengxuan Zhu, Minggui Teng, Zijian Jia, Han Jiang, Boxin Shi

PDF

Open Access 1 Models

TL;DR

PanoWan is a novel method that adapts pre-trained text-to-video models for high-quality 360-degree panoramic video generation by addressing spatial distortions and boundary issues, supported by a new panoramic video dataset.

Contribution

It introduces PanoWan, a lightweight framework with latitude-aware sampling and boundary handling mechanisms, enabling effective transfer of pre-trained models to panoramic video synthesis.

Findings

01

Achieves state-of-the-art panoramic video generation quality

02

Demonstrates robustness in zero-shot downstream tasks

03

Provides a new high-quality panoramic video dataset PanoVid

Abstract

Panoramic video generation enables immersive 360{\deg} content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
YOUSIKI/PanoWan
model· 19 dl
19 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation