Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng; Alexander G. Schwing; Alexander Kirillov

arXiv:2107.06278·cs.CV·November 2, 2021·167 cites

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

PDF

Open Access 3 Repos 10 Models 1 Video

TL;DR

This paper introduces MaskFormer, a unified mask classification approach for semantic and panoptic segmentation that simplifies existing methods and achieves state-of-the-art results by predicting sets of binary masks with associated class labels.

Contribution

The paper proposes MaskFormer, a novel unified mask classification model that handles both semantic and instance segmentation with the same framework, loss, and training procedure.

Findings

01

MaskFormer outperforms per-pixel classification baselines on large class sets.

02

Achieves 55.6 mIoU on ADE20K for semantic segmentation.

03

Achieves 52.7 PQ on COCO for panoptic segmentation.

Abstract

Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Per-Pixel Classification is Not All You Need for Semantic Segmentation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques