Rethinking Prior Information Generation with CLIP for Few-Shot   Segmentation

Jin Wang; Bingfeng Zhang; Jian Pang; Honglong Chen; Weifeng Liu

arXiv:2405.08458·cs.CV·May 15, 2024

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang, Honglong Chen, Weifeng Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for few-shot segmentation that leverages CLIP's visual-text alignment to generate more reliable prior information, significantly improving performance over traditional high-level feature map methods.

Contribution

The authors propose training-free prior generation strategies using CLIP's semantic alignment, enhancing generalization and accuracy in few-shot segmentation tasks.

Findings

01

Achieves state-of-the-art results on PASCAL-5{i} and COCO-20{i} datasets.

02

Demonstrates superior generalization to unseen classes.

03

Improves prior guidance accuracy through high-order attention map relationships.

Abstract

Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However, such a prior representation suffers from coarse granularity and poor generalization to new classes since these high-level feature maps have obvious category bias. In this work, we propose to replace the visual prior representation with the visual-text alignment capacity to capture more reliable guidance and enhance the model generalization. Specifically, we design two kinds of training-free prior information generation strategy that attempts to utilize the semantic alignment capability of the Contrastive Language-Image Pre-training model (CLIP) to locate the target class. Besides, to acquire…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vangjin/PI-CLIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications