PanoNormal: Monocular Indoor 360{\deg} Surface Normal Estimation
Kun Huang, Fanglue Zhang, Neil Dodgson

TL;DR
PanoNormal is a novel architecture combining CNNs and ViTs with spherical-aware self-attention to improve monocular surface normal estimation in 360° images, outperforming existing methods.
Contribution
It introduces a multi-level global self-attention mechanism tailored for spherical images, effectively capturing both global and local geometric cues for surface normal prediction.
Findings
Achieves state-of-the-art results on benchmark datasets.
Significantly outperforms adapted depth estimation models.
Effectively models both global context and local details.
Abstract
The presence of spherical distortion in equirectangular projection (ERP) images presents a persistent challenge in dense regression tasks such as surface normal estimation. Although it may appear straightforward to repurpose architectures developed for 360{\deg} depth estimation, our empirical findings indicate that such models yield suboptimal performance when applied to surface normal prediction. This is largely attributed to their architectural bias toward capturing global scene layout, which comes at the expense of the fine-grained local geometric cues that are critical for accurate surface orientation estimation. While convolutional neural networks (CNNs) have been employed to mitigate spherical distortion, their fixed receptive fields limit their ability to capture holistic scene structure. Conversely, vision transformers (ViTs) are capable of modeling long-range dependencies via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Robotics and Sensor-Based Localization
