PanoNormal: Monocular Indoor 360{\deg} Surface Normal Estimation

Kun Huang; Fanglue Zhang; Neil Dodgson

arXiv:2405.18745·cs.CV·January 26, 2026

PanoNormal: Monocular Indoor 360{\deg} Surface Normal Estimation

Kun Huang, Fanglue Zhang, Neil Dodgson

PDF

Open Access

TL;DR

PanoNormal is a novel architecture combining CNNs and ViTs with spherical-aware self-attention to improve monocular surface normal estimation in 360° images, outperforming existing methods.

Contribution

It introduces a multi-level global self-attention mechanism tailored for spherical images, effectively capturing both global and local geometric cues for surface normal prediction.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Significantly outperforms adapted depth estimation models.

03

Effectively models both global context and local details.

Abstract

The presence of spherical distortion in equirectangular projection (ERP) images presents a persistent challenge in dense regression tasks such as surface normal estimation. Although it may appear straightforward to repurpose architectures developed for 360{\deg} depth estimation, our empirical findings indicate that such models yield suboptimal performance when applied to surface normal prediction. This is largely attributed to their architectural bias toward capturing global scene layout, which comes at the expense of the fine-grained local geometric cues that are critical for accurate surface orientation estimation. While convolutional neural networks (CNNs) have been employed to mitigate spherical distortion, their fixed receptive fields limit their ability to capture holistic scene structure. Conversely, vision transformers (ViTs) are capable of modeling long-range dependencies via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Robotics and Sensor-Based Localization