CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic   Segmentation For-Free

Monika Wysocza\'nska; Micha\"el Ramamonjisoa; Tomasz Trzci\'nski,; Oriane Sim\'eoni

arXiv:2309.14289·cs.CV·November 29, 2023

CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free

Monika Wysocza\'nska, Micha\"el Ramamonjisoa, Tomasz Trzci\'nski,, Oriane Sim\'eoni

PDF

Open Access 1 Repo 1 Video

TL;DR

CLIP-DIY introduces a zero-shot semantic segmentation method that leverages CLIP's classification abilities and unsupervised localization, achieving state-of-the-art results without additional training.

Contribution

It proposes a novel open-vocabulary segmentation approach that uses CLIP and unsupervised localization, eliminating the need for extra training or annotations.

Findings

01

State-of-the-art zero-shot results on PASCAL VOC

02

Competitive performance on COCO

03

No additional training required

Abstract

The emergence of CLIP has opened the way for open-world image perception. The zero-shot classification capabilities of the model are impressive but are harder to use for dense tasks such as image segmentation. Several methods have proposed different modifications and learning schemes to produce dense output. Instead, we propose in this work an open-vocabulary semantic segmentation method, dubbed CLIP-DIY, which does not require any additional training or annotations, but instead leverages existing unsupervised object localization approaches. In particular, CLIP-DIY is a multi-scale approach that directly exploits CLIP classification abilities on patches of different sizes and aggregates the decision in a single map. We further guide the segmentation using foreground/background scores obtained using unsupervised object localization methods. With our method, we obtain state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wysoczanska/clip-diy
pytorchOfficial

Videos

CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free· youtube

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training