NamedMask: Distilling Segmenters from Complementary Foundation Models

Gyungin Shin; Weidi Xie; Samuel Albanie

arXiv:2209.11228·cs.CV·September 23, 2022

NamedMask: Distilling Segmenters from Complementary Foundation Models

Gyungin Shin, Weidi Xie, Samuel Albanie

PDF

Open Access 1 Repo

TL;DR

NamedMask is a novel approach that distills the strengths of CLIP and DINO foundation models to perform zero-label semantic segmentation, achieving competitive results on multiple benchmarks.

Contribution

It introduces a method to generate high-quality segmentation masks without pixel labels by combining CLIP's naming ability with DINO's spatial understanding.

Findings

01

Achieves strong performance on VOC2012, COCO, and ImageNet-S datasets.

02

Effectively segments both single-object and multi-object images.

03

Outperforms prior methods in zero-label semantic segmentation.

Abstract

The goal of this work is to segment and name regions of images without access to pixel-level labels during training. To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to image content but lacks an accessible representation of object structure. The second, DINO (Caron et al. 2021), captures the spatial extent of objects but has no knowledge of object names. Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noelshin/namedmask
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer · Contrastive Language-Image Pre-training