More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

Weijia Fan; Ruiping Liu; Jiale Wei; Yufan Chen; Junwei Zheng; Zichao Zeng; Jiaming Zhang; Qiufu Li; Linlin Shen; Rainer Stiefelhagen

arXiv:2603.09573·cs.CV·April 7, 2026

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

Weijia Fan, Ruiping Liu, Jiale Wei, Yufan Chen, Junwei Zheng, Zichao Zeng, Jiaming Zhang, Qiufu Li, Linlin Shen, Rainer Stiefelhagen

PDF

1 Repo

TL;DR

This paper introduces a panoramic vision-language reasoning paradigm and dataset, enabling existing models to better understand 360-degree scenes with occlusions and adverse conditions.

Contribution

It proposes the Panorama-Language Modeling (PLM) paradigm, a panoramic dataset PanoVQA, and a plug-and-play attention module for improved omni-scene understanding.

Findings

01

PLM achieves superior robustness in challenging omni-scenes.

02

The panoramic attention module enables existing models to process panoramas without retraining.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Existing vision-language models (VLMs) are tailored for pinhole imagery, stitching multiple narrow field-of-view inputs to piece together a complete omni-scene understanding. Yet, such multi-view perception overlooks the holistic spatial and contextual relationships that a single panorama inherently preserves. In this work, we introduce the Panorama-Language Modeling (PLM)paradigm, a unified $36 0^{\circ}$ vision-language reasoning that is more than the sum of its pinhole counterparts. Besides, we present PanoVQA, a large-scale panoramic VQA dataset that involves adverse omni-scenes, enabling comprehensive reasoning under object occlusions and driving accidents. To establish a foundation for PLM, we develop a plug-and-play panoramic sparse attention module that allows existing pinhole-based VLMs to process equirectangular panoramas without retraining. Extensive experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

InSAI-Lab/PanoVQA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.