Spherical Transformer
Sungmin Cho, Raehyuk Jung, Junseok Kwon

TL;DR
This paper introduces a transformer-based approach for 360-image classification that avoids planar projection distortions and achieves low rotation equivariance errors through a novel sampling method based on regular polyhedrons.
Contribution
The paper proposes a transformer architecture for 360-images that eliminates planar projection distortions and improves rotation equivariance using polyhedron-based sampling.
Findings
Reduces distortion in 360-image classification.
Achieves rotation equivariance with specific rotations.
Performs competitively on SPH-MNIST, SPH-CIFAR, and SUN360 datasets.
Abstract
Using convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360images. Using the proposed transformer for 360images has two advantages. First, our method does not require the erroneous planar projection process by sampling pixels from the sphere surface. Second, our sampling method based on regular polyhedrons makes low rotation equivariance errors, because specific rotations can be reduced to permutations of faces. In experiments, we validate our network on two aspects, as follows. First, we show that using a transformer with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
