BoxInst: High-Performance Instance Segmentation with Box Annotations
Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen

TL;DR
This paper introduces a novel loss function for instance segmentation that uses only bounding box annotations for training, significantly improving performance over previous weakly supervised methods.
Contribution
The paper proposes a simple yet effective loss redesign that enables high-quality instance segmentation with only box annotations, without modifying the segmentation network architecture.
Findings
Achieves 33.2% mask AP on COCO test-dev with only box annotations.
Redesigns the mask loss to improve weakly supervised segmentation performance.
Narrows the gap between weakly and fully supervised instance segmentation.
Abstract
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training. While this setting has been studied in the literature, here we show significantly stronger performance with a simple design (e.g., dramatically improving previous best reported mask AP of 21.1% in Hsu et al. (2019) to 31.6% on the COCO dataset). Our core idea is to redesign the loss of learning masks in instance segmentation, with no modification to the segmentation network itself. The new loss functions can supervise the mask training without relying on mask annotations. This is made possible with two loss terms, namely, 1) a surrogate term that minimizes the discrepancy between the projections of the ground-truth box and the predicted mask; 2) a pairwise loss that can exploit the prior that proximal pixels with similar colors are very likely to have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
