Bending Reality: Distortion-aware Transformers for Adapting to Panoramic   Semantic Segmentation

Jiaming Zhang; Kailun Yang; Chaoxiang Ma; Simon Rei{\ss}; Kunyu Peng,; Rainer Stiefelhagen

arXiv:2203.01452·cs.CV·March 21, 2022

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation

Jiaming Zhang, Kailun Yang, Chaoxiang Ma, Simon Rei{\ss}, Kunyu Peng,, Rainer Stiefelhagen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Trans4PASS, a transformer-based model with deformable components and domain adaptation techniques, significantly improving panoramic semantic segmentation performance while reducing the need for extensive labeled panoramic data.

Contribution

It proposes a novel transformer architecture with deformable embedding and a mutual prototypical adaptation method for unsupervised domain adaptation in panoramic segmentation.

Findings

01

Achieves comparable performance to fully-supervised models on indoor datasets.

02

Breaks state-of-the-art by 14.39% mIoU on outdoor datasets.

03

Reduces the need for over 1,400 labeled panoramic images.

Abstract

Panoramic images with their 360-degree directional view encompass exhaustive information about the surrounding space, providing a rich foundation for scene understanding. To unfold this potential in the form of robust panoramic segmentation models, large quantities of expensive, pixel-wise annotations are crucial for success. Such annotations are available, but predominantly for narrow-angle, pinhole-camera images which, off the shelf, serve as sub-optimal resources for training panoramic models. Distortions and the distinct image-feature distribution in 360-degree panoramas impede the transfer from the annotation-rich pinhole domain and therefore come with a big dent in performance. To get around this domain difference and bring together semantic annotations from pinhole- and 360-degree surround-visuals, we propose to learn object deformations and panoramic image distortions in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamycheung/trans4pass
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Softmax · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization