Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation
Jiaming Zhang, Kailun Yang, Chaoxiang Ma, Simon Rei{\ss}, Kunyu Peng,, Rainer Stiefelhagen

TL;DR
This paper introduces Trans4PASS, a transformer-based model with deformable components and domain adaptation techniques, significantly improving panoramic semantic segmentation performance while reducing the need for extensive labeled panoramic data.
Contribution
It proposes a novel transformer architecture with deformable embedding and a mutual prototypical adaptation method for unsupervised domain adaptation in panoramic segmentation.
Findings
Achieves comparable performance to fully-supervised models on indoor datasets.
Breaks state-of-the-art by 14.39% mIoU on outdoor datasets.
Reduces the need for over 1,400 labeled panoramic images.
Abstract
Panoramic images with their 360-degree directional view encompass exhaustive information about the surrounding space, providing a rich foundation for scene understanding. To unfold this potential in the form of robust panoramic segmentation models, large quantities of expensive, pixel-wise annotations are crucial for success. Such annotations are available, but predominantly for narrow-angle, pinhole-camera images which, off the shelf, serve as sub-optimal resources for training panoramic models. Distortions and the distinct image-feature distribution in 360-degree panoramas impede the transfer from the annotation-rich pinhole domain and therefore come with a big dent in performance. To get around this domain difference and bring together semantic annotations from pinhole- and 360-degree surround-visuals, we propose to learn object deformations and panoramic image distortions in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Softmax · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization
