MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation
Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang

TL;DR
MetaSeg introduces a MetaFormer-based architecture for semantic segmentation that effectively captures global context while maintaining computational efficiency, outperforming previous methods on multiple benchmarks.
Contribution
The paper extends MetaFormer architecture to both backbone and decoder in semantic segmentation, introducing a novel self-attention module with channel reduction for efficiency.
Findings
Outperforms state-of-the-art methods on ADE20K, Cityscapes, COCO-stuff, and Synapse datasets.
Uses a novel Channel Reduction Attention (CRA) module for efficient global context extraction.
Demonstrates the effectiveness of MetaFormer architecture in both backbone and decoder for segmentation.
Abstract
Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the Metaformer architecture more extensively in the semantic segmentation task. We propose a powerful semantic segmentation network, MetaSeg, which leverages the Metaformer architecture from the backbone to the decoder. Our MetaSeg shows that the MetaFormer architecture plays a significant role in capturing the useful contexts for the decoder as well as for the backbone. In addition, recent segmentation methods have shown that using a CNN-based backbone for extracting the spatial information and a decoder for extracting the global information is more effective than using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MetaSeg: MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation· youtube
Taxonomy
TopicsRobotics and Automated Systems · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
