IncepFormer: Efficient Inception Transformer with Pyramid Pooling for   Semantic Segmentation

Lihua Fu; Haoyue Tian; Xiangping Bryce Zhai; Pan Gao; Xiaojiang Peng

arXiv:2212.03035·cs.CV·December 7, 2022·5 cites

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

Lihua Fu, Haoyue Tian, Xiangping Bryce Zhai, Pan Gao, Xiaojiang Peng

PDF

Open Access 1 Repo

TL;DR

IncepFormer is a novel Transformer-based architecture for semantic segmentation that combines pyramid structured encoders with Inception-like modules to achieve high accuracy and efficiency across multiple benchmarks.

Contribution

It introduces a pyramid structured Transformer encoder and integrates Inception-like modules with depth-wise convolutions for improved local and global feature extraction.

Findings

01

Achieves 47.7% mIoU on ADE20K with fewer parameters and FLOPs.

02

Attains 82.0% mIoU on Cityscapes with 39.6M parameters.

03

Outperforms state-of-the-art methods in accuracy and speed.

Abstract

Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shendu0321/incepformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Layer Normalization · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Convolution · Linear Layer