Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu, Tianyi Wu, Fangjian Lin, Shengwei Tian, Guodong Guo

TL;DR
This paper introduces a novel Fully Transformer Network framework for semantic image segmentation, utilizing a Pyramid Group Transformer encoder and a Feature Pyramid Transformer for multi-level feature fusion, achieving superior results on multiple benchmarks.
Contribution
The paper proposes a pure Transformer-based encoder-decoder framework for semantic segmentation, combining hierarchical feature learning and multi-level feature fusion, outperforming existing methods.
Findings
Achieved better results on PASCAL Context, ADE20K, COCOStuff, and CelebAMask-HQ.
Introduced Pyramid Group Transformer (PGT) for efficient hierarchical feature learning.
Developed Feature Pyramid Transformer (FPT) for effective multi-level feature fusion.
Abstract
Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated that combining such Transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure Transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, meanwhile reducing the computation complexity of the standard Visual Transformer (ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Residual Connection · Dense Connections · Softmax · Dropout
