SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf

TL;DR
SphereUFormer introduces a transformer architecture with spherical local self-attention for improved 360-degree perception, overcoming distortions of traditional projections and outperforming existing methods in depth and segmentation tasks.
Contribution
It presents a novel spherical transformer architecture with specialized modules for omnidirectional perception, advancing beyond prior projection-based and convolutional approaches.
Findings
Outperforms state-of-the-art in depth estimation
Achieves superior results in semantic segmentation
Operates effectively directly on spherical data
Abstract
This paper proposes a novel method for omnidirectional 360 perception. Most common previous methods relied on equirectangular projection. This representation is easily applicable to 2D operation layers but introduces distortions into the image. Other methods attempted to remove the distortions by maintaining a sphere representation but relied on complicated convolution kernels that failed to show competitive results. In this work, we introduce a transformer-based architecture that, by incorporating a novel ``Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360 perception benchmarks for depth estimation and semantic segmentation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Robotics and Sensor-Based Localization
MethodsConvolution
