PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World
Changpeng Wang, Xin Lin, Junhan Liu, Yuheng Liu, Zhen Wang, Donglian Qi, Yunfeng Yan, Xi Chen

TL;DR
This paper introduces PanoWorld, a model designed for ERP panorama understanding, leveraging geometry-aware supervision and spherical attention to improve spatial reasoning in 360-degree environments.
Contribution
It develops a new pano-native understanding framework with a geometry-aware training pipeline and a spherical spatial attention mechanism, advancing panoramic spatial reasoning.
Findings
PanoWorld outperforms baselines on multiple spatial reasoning benchmarks.
Geometry-aware supervision significantly improves ERP-native spatial understanding.
Spherical spatial cross-attention enhances model reasoning over panoramic data.
Abstract
Multimodal large laboratory models (MLLMs) still struggle with spatial understanding under the dominant perspective-image paradigm, which inherits the narrow field of view of human-like perception. For navigation, robotic search, and 3D scene understanding, 360-degree panoramic sensing offers a form of supersensing by capturing the entire surrounding environment at once. However, existing MLLM pipelines typically decompose panoramas into multiple perspective views, leaving the spherical structure of equirectangular projection (ERP) largely implicit. In this paper, we study pano-native understanding, which requires an MLLM to reason over an ERP panorama as a continuous, observer-centered space. To this end, we first define the key abilities for pano-native understanding, including semantic anchoring, spherical localization, reference-frame transformation, and depth-aware 3D spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
