Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng; Ishan Misra; Alexander G. Schwing; Alexander; Kirillov; Rohit Girdhar

arXiv:2112.01527·cs.CV·June 17, 2022·129 cites

Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander, Kirillov, Rohit Girdhar

PDF

Open Access 5 Repos 10 Models

TL;DR

Mask2Former is a versatile image segmentation architecture that unifies multiple tasks with a single model, outperforming specialized methods and setting new state-of-the-art results across various datasets.

Contribution

Introduces Mask2Former, a unified transformer-based architecture capable of handling panoptic, instance, and semantic segmentation tasks with improved efficiency and accuracy.

Findings

01

Outperforms specialized architectures on four datasets.

02

Sets new state-of-the-art for panoptic, instance, and semantic segmentation.

03

Reduces research effort by at least three times.

Abstract

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Image Segmentation Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Byte Pair Encoding · Label Smoothing · Absolute Position Encodings · Residual Connection · Softmax