Superpixel Transformers for Efficient Semantic Segmentation
Alex Zihao Zhu, Jieru Mei, Siyuan Qiao, Hang Yan, Yukun Zhu,, Liang-Chieh Chen, Henrik Kretzschmar

TL;DR
This paper introduces Superpixel Transformers, a novel approach that combines superpixel over-segmentation with transformer-based global context modeling to achieve efficient and accurate semantic segmentation.
Contribution
The paper proposes a superpixel-based transformer framework that improves computational efficiency while maintaining state-of-the-art accuracy in semantic segmentation.
Findings
Achieves state-of-the-art accuracy on Cityscapes and ADE20K datasets.
Reduces model parameters and latency compared to convolution-based methods.
Enriches superpixel features with global context using self-attention.
Abstract
Semantic segmentation, which aims to classify every pixel in an image, is a key task in machine perception, with many applications across robotics and autonomous driving. Due to the high dimensionality of this task, most existing approaches use local operations, such as convolutions, to generate per-pixel features. However, these methods are typically unable to effectively leverage global context information due to the high computational costs of operating on a dense image. In this work, we propose a solution to this issue by leveraging the idea of superpixels, an over-segmentation of the image, and applying them with a modern transformer framework. In particular, our model learns to decompose the pixel space into a spatially low dimensional superpixel space via a series of local cross-attentions. We then apply multi-head self-attention to the superpixels to enrich the superpixel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Image Segmentation Techniques
