Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation
Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool

TL;DR
MaskDistill introduces a data-driven, unsupervised framework for semantic segmentation that generates and refines object masks without handcrafted priors, outperforming previous methods on PASCAL and COCO datasets.
Contribution
The paper proposes MaskDistill, a novel unsupervised segmentation framework that eliminates handcrafted priors and uses clustering and filtering of object masks for improved accuracy.
Findings
Outperforms previous methods on PASCAL (+11% mIoU) and COCO (+4% mask AP50)
Does not rely on low-level image cues or object-centric assumptions
Generates high-quality object masks without handcrafted priors
Abstract
The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to generate object masks that serve as a pixel grouping prior for semantic segmentation. This approach omits handcrafted priors, which are often designed for specific scene compositions and limit the applicability of competing frameworks. Second, MaskDistill clusters the object masks to obtain pseudo-ground-truth for training an initial object segmentation model. Third, we leverage this model to filter out low-quality object masks. This strategy mitigates the noise in our pixel grouping prior and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
