MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

TL;DR
MaX-DeepLab introduces an end-to-end panoptic segmentation model using mask transformers, eliminating complex sub-tasks and achieving state-of-the-art results on COCO dataset.
Contribution
It is the first to propose an end-to-end panoptic segmentation model with a mask transformer, simplifying the pipeline and improving performance.
Findings
7.1% PQ gain in box-free regime on COCO
Improves 3.0% PQ over DETR with similar parameters
Achieves 51.3% PQ on COCO test-dev set
Abstract
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Dropout · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Adam
