JoPano: Unified Panorama Generation via Joint Modeling

Wancheng Feng; Chen An; Zhenliang He; Meina Kan; Shiguang Shan; Lukun Wang

arXiv:2512.06885·cs.CV·December 9, 2025

JoPano: Unified Panorama Generation via Joint Modeling

Wancheng Feng, Chen An, Zhenliang He, Meina Kan, Shiguang Shan, Lukun Wang

PDF

Open Access

TL;DR

JoPano introduces a unified model for panorama generation that combines text-to-panorama and view-to-panorama tasks using a DiT-based architecture, improving quality and efficiency with novel adapters and blending techniques.

Contribution

The paper presents a joint modeling framework with a new adapter and blending methods, enabling high-quality panorama generation for multiple tasks within a single model.

Findings

01

Achieves state-of-the-art results on FID, CLIP-FID, IS, and CLIP-Score metrics.

02

Effectively reduces seam inconsistencies with Poisson Blending.

03

Unifies two panorama generation tasks in a single, efficient model.

Abstract

Panorama generation has recently attracted growing interest in the research community, with two core tasks, text-to-panorama and view-to-panorama generation. However, existing methods still face two major challenges: their U-Net-based architectures constrain the visual quality of the generated panoramas, and they usually treat the two core tasks independently, which leads to modeling redundancy and inefficiency. To overcome these challenges, we propose a joint-face panorama (JoPano) generation approach that unifies the two core tasks within a DiT-based model. To transfer the rich generative capabilities of existing DiT backbones learned from natural images to the panorama domain, we propose a Joint-Face Adapter built on the cubemap representation of panoramas, which enables a pretrained DiT to jointly model and generate different views of a panorama. We further apply Poisson Blending to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques