ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Kwanyoung Kim; Yujin Oh; Jong Chul Ye

arXiv:2301.12171·cs.CV·May 31, 2023·6 cites

ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

Kwanyoung Kim, Yujin Oh, Jong Chul Ye

PDF

Open Access 1 Repo

TL;DR

ZegOT introduces a novel zero-shot segmentation method that uses optimal transport to match multiple text prompts with frozen image features, achieving state-of-the-art results without retraining CLIP.

Contribution

The paper presents a new optimal transport-based approach with a multiple prompt solver for zero-shot segmentation, avoiding additional training or image encoders.

Findings

01

Achieves state-of-the-art zero-shot segmentation performance.

02

Effectively aligns multiple text prompts with visual features.

03

Operates without retraining or modifying the CLIP model.

Abstract

Recent success of large-scale Contrastive Language-Image Pre-training (CLIP) has led to great promise in zero-shot semantic segmentation by transferring image-text aligned knowledge to pixel-level classification. However, existing methods usually require an additional image encoder or retraining/tuning the CLIP module. Here, we propose a novel Zero-shot segmentation with Optimal Transport (ZegOT) method that matches multiple text prompts with frozen image embeddings through optimal transport. In particular, we introduce a novel Multiple Prompt Optimal Transport Solver (MPOT), which is designed to learn an optimal mapping between multiple text prompts and visual feature maps of the frozen image encoder hidden layers. This unique mapping method facilitates each of the multiple text prompts to effectively focus on distinct visual semantic attributes. Through extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cubeyoung/OTSeg
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training