MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient   Semantic Segmentation

Beoungwoo Kang; Seunghun Moon; Yubin Cho; Hyunwoo Yu; Suk-Ju Kang

arXiv:2408.07576·cs.CV·August 16, 2024

MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation

Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang

PDF

Open Access 1 Repo 1 Video

TL;DR

MetaSeg introduces a MetaFormer-based architecture for semantic segmentation that effectively captures global context while maintaining computational efficiency, outperforming previous methods on multiple benchmarks.

Contribution

The paper extends MetaFormer architecture to both backbone and decoder in semantic segmentation, introducing a novel self-attention module with channel reduction for efficiency.

Findings

01

Outperforms state-of-the-art methods on ADE20K, Cityscapes, COCO-stuff, and Synapse datasets.

02

Uses a novel Channel Reduction Attention (CRA) module for efficient global context extraction.

03

Demonstrates the effectiveness of MetaFormer architecture in both backbone and decoder for segmentation.

Abstract

Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the Metaformer architecture more extensively in the semantic segmentation task. We propose a powerful semantic segmentation network, MetaSeg, which leverages the Metaformer architecture from the backbone to the decoder. Our MetaSeg shows that the MetaFormer architecture plays a significant role in capturing the useful contexts for the decoder as well as for the backbone. In addition, recent segmentation methods have shown that using a CNN-based backbone for extracting the spatial information and a decoder for extracting the global information is more effective than using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyunwoo137/metaseg
pytorchOfficial

Videos

MetaSeg: MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation· youtube

Taxonomy

TopicsRobotics and Automated Systems · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections