SSformer: A Lightweight Transformer for Semantic Segmentation

Wentao Shi; Jing Xu; Pan Gao

arXiv:2208.02034·cs.CV·August 4, 2022·1 cites

SSformer: A Lightweight Transformer for Semantic Segmentation

Wentao Shi, Jing Xu, Pan Gao

PDF

Open Access 1 Repo

TL;DR

SSformer is a lightweight transformer model tailored for semantic segmentation, combining hierarchical design and a novel decoder to achieve competitive accuracy with reduced model size and computational cost.

Contribution

The paper introduces SSformer, a new lightweight transformer architecture that effectively integrates local and global attention for semantic segmentation.

Findings

01

Achieves comparable mIoU to state-of-the-art models.

02

Maintains smaller model size and lower computational complexity.

03

Demonstrates effectiveness of hierarchical design with a specialized decoder.

Abstract

It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiwt03/ssformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Byte Pair Encoding · Label Smoothing