Vision Transformers Are Good Mask Auto-Labelers

Shiyi Lan; Xitong Yang; Zhiding Yu; Zuxuan Wu; Jose M. Alvarez; Anima; Anandkumar

arXiv:2301.03992·cs.CV·January 11, 2023

Vision Transformers Are Good Mask Auto-Labelers

Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima, Anandkumar

PDF

Open Access

TL;DR

This paper introduces Mask Auto-Labeler (MAL), a Transformer-based framework that generates high-quality mask pseudo-labels from box annotations, enabling near-supervised performance in instance segmentation.

Contribution

The paper demonstrates that Vision Transformers can effectively auto-label masks from box annotations, significantly narrowing the gap with human annotations in instance segmentation.

Findings

01

MAL achieves 44.1% mAP on COCO, surpassing previous box-supervised methods.

02

Masks generated by MAL are sometimes better than human annotations.

03

Instance segmentation models trained with MAL masks reach up to 97.4% of fully-supervised performance.

Abstract

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding mask quality. Instance segmentation models trained using the MAL-generated masks can nearly match the performance of their fully-supervised counterparts, retaining up to 97.4\% performance of fully supervised models. The best model achieves 44.1\% mAP on COCO instance segmentation (test-dev 2017), outperforming state-of-the-art box-supervised methods by significant margins. Qualitative results indicate that masks produced by MAL are, in some cases, even better than human annotations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Infrastructure Maintenance and Monitoring