MP-Former: Mask-Piloted Transformer for Image Segmentation

Hao Zhang; Feng Li; Huaizhe Xu; Shijia Huang; Shilong Liu; Lionel M.; Ni; Lei Zhang

arXiv:2303.07336·cs.CV·March 16, 2023·6 cites

MP-Former: Mask-Piloted Transformer for Image Segmentation

Hao Zhang, Feng Li, Huaizhe Xu, Shijia Huang, Shilong Liu, Lionel M., Ni, Lei Zhang

PDF

Open Access 1 Repo

TL;DR

MP-Former introduces a mask-piloted training method for image segmentation that enhances mask prediction consistency across decoder layers, leading to improved accuracy and faster training without extra inference cost.

Contribution

The paper proposes a novel mask-piloted training approach that addresses mask prediction inconsistency in Mask2Former, significantly boosting segmentation performance and training efficiency.

Findings

01

Achieves +2.3 AP and +1.6 mIoU on Cityscapes with ResNet-50.

02

Speeds up training, outperforming Mask2Former with fewer epochs.

03

Requires minimal additional computation during training and none during inference.

Abstract

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation. The improvement is based on our observation that Mask2Former suffers from inconsistent mask predictions between consecutive decoder layers, which leads to inconsistent optimization goals and low utilization of decoder queries. To address this problem, we propose a mask-piloted training approach, which additionally feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones. Compared with the predicted masks used in mask-attention, the ground-truth masks serve as a pilot and effectively alleviate the negative impact of inaccurate mask predictions in Mask2Former. Based on this technique, our \M achieves a remarkable performance improvement on all three image segmentation tasks (instance, panoptic, and semantic), yielding $+ 2.3$ AP and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idea-research/mp-former
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dense Connections · Absolute Position Encodings · Linear Layer · Label Smoothing · Dropout · Adam · Softmax