ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang; Kai Zhu; Zhiheng Liu; Yu Liu; Wei Zhai; Yang Cao; Zheng-Jun Zha

arXiv:2506.23513·cs.CV·July 1, 2025

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang, Kai Zhu, Zhiheng Liu, Yu Liu, Wei Zhai, Yang Cao, Zheng-Jun Zha

PDF

Open Access

TL;DR

This paper introduces ViewPoint, a framework that leverages pretrained perspective video models and a novel panorama representation to generate high-quality, spatially consistent panoramic videos, advancing the state-of-the-art in VR content creation.

Contribution

The paper proposes a new panorama representation called ViewPoint map and a Pano-Perspective attention mechanism to effectively utilize pretrained perspective models for panoramic video synthesis.

Findings

01

Achieves state-of-the-art performance in panoramic video generation.

02

Produces highly dynamic and spatially consistent videos.

03

Outperforms previous methods in quality and realism.

Abstract

Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to the inherent modality gap between panoramic data and perspective data, which constitutes the majority of the training data for modern diffusion models. In this paper, we propose a novel framework utilizing pretrained perspective video models for generating panoramic videos. Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously. With our proposed Pano-Perspective attention mechanism, the model benefits from pretrained perspective priors and captures the panoramic spatial correlations of the ViewPoint map effectively. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Motion and Animation