DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan,, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

TL;DR
DiscoBox is a framework that jointly learns instance segmentation and semantic correspondence from bounding box supervision, using a structured teacher model to improve both tasks and achieve competitive results.
Contribution
It introduces a self-ensembling framework with a structured energy model that jointly refines segmentation and correspondence, advancing weakly supervised learning methods.
Findings
Achieves 37.9% AP on COCO instance segmentation
Surpasses prior weakly supervised methods
Attains state-of-the-art results on PASCAL VOC12 and PF-PASCAL
Abstract
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense constrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
