SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception

Yaniv Benny; Lior Wolf

arXiv:2412.06968·cs.CV·December 11, 2024

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception

Yaniv Benny, Lior Wolf

PDF

Open Access

TL;DR

SphereUFormer introduces a transformer architecture with spherical local self-attention for improved 360-degree perception, overcoming distortions of traditional projections and outperforming existing methods in depth and segmentation tasks.

Contribution

It presents a novel spherical transformer architecture with specialized modules for omnidirectional perception, advancing beyond prior projection-based and convolutional approaches.

Findings

01

Outperforms state-of-the-art in depth estimation

02

Achieves superior results in semantic segmentation

03

Operates effectively directly on spherical data

Abstract

This paper proposes a novel method for omnidirectional 360 $°$ perception. Most common previous methods relied on equirectangular projection. This representation is easily applicable to 2D operation layers but introduces distortions into the image. Other methods attempted to remove the distortions by maintaining a sphere representation but relied on complicated convolution kernels that failed to show competitive results. In this work, we introduce a transformer-based architecture that, by incorporating a novel ``Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360 $°$ perception benchmarks for depth estimation and semantic segmentation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Robotics and Sensor-Based Localization

MethodsConvolution