MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Huiyu Wang; Yukun Zhu; Hartwig Adam; Alan Yuille; Liang-Chieh Chen

arXiv:2012.00759·cs.CV·July 14, 2021·32 cites

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

PDF

Open Access 3 Repos

TL;DR

MaX-DeepLab introduces an end-to-end panoptic segmentation model using mask transformers, eliminating complex sub-tasks and achieving state-of-the-art results on COCO dataset.

Contribution

It is the first to propose an end-to-end panoptic segmentation model with a mask transformer, simplifying the pipeline and improving performance.

Findings

01

7.1% PQ gain in box-free regime on COCO

02

Improves 3.0% PQ over DETR with similar parameters

03

Achieves 51.3% PQ on COCO test-dev set

Abstract

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Dropout · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Adam