SO3UFormer: Learning Intrinsic Spherical Features for Rotation-Robust Panoramic Segmentation
Qinfeng Zhu, Yunxi Jiang, Lei Fan

TL;DR
SO3UFormer is a novel rotation-robust panoramic segmentation architecture that learns intrinsic spherical features, maintaining high performance under arbitrary 3D rotations, unlike existing models that overfit to latitude cues.
Contribution
The paper introduces SO3UFormer, a rotation-invariant model with geometric principles and a new benchmark dataset, Pose35, for robust panoramic segmentation under arbitrary rotations.
Findings
SO3UFormer maintains high accuracy under full SO(3) rotations.
Existing models fail catastrophically under arbitrary rotations.
Pose35 dataset enables benchmarking rotation robustness.
Abstract
Panoramic semantic segmentation models are typically trained under a strict gravity-aligned assumption. However, real-world captures often deviate from this canonical orientation due to unconstrained camera motions, such as the rotational jitter of handheld devices or the dynamic attitude shifts of aerial platforms. This discrepancy causes standard spherical Transformers to overfit global latitude cues, leading to performance collapse under 3D reorientations. To address this, we introduce SO3UFormer, a rotation-robust architecture designed to learn intrinsic spherical features that are less sensitive to the underlying coordinate frame. Our approach rests on three geometric pillars: (1) an intrinsic feature formulation that decouples the representation from the gravity vector by removing absolute latitude encoding; (2) quadrature-consistent spherical attention that accounts for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
