SSformer: A Lightweight Transformer for Semantic Segmentation
Wentao Shi, Jing Xu, Pan Gao

TL;DR
SSformer is a lightweight transformer model tailored for semantic segmentation, combining hierarchical design and a novel decoder to achieve competitive accuracy with reduced model size and computational cost.
Contribution
The paper introduces SSformer, a new lightweight transformer architecture that effectively integrates local and global attention for semantic segmentation.
Findings
Achieves comparable mIoU to state-of-the-art models.
Maintains smaller model size and lower computational complexity.
Demonstrates effectiveness of hierarchical design with a specialized decoder.
Abstract
It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Byte Pair Encoding · Label Smoothing
