ICPC: Instance-Conditioned Prompting with Contrastive Learning for   Semantic Segmentation

Chaohui Yu; Qiang Zhou; Zhibin Wang; Fan Wang

arXiv:2308.07078·cs.CV·August 15, 2023·2 cites

ICPC: Instance-Conditioned Prompting with Contrastive Learning for Semantic Segmentation

Chaohui Yu, Qiang Zhou, Zhibin Wang, Fan Wang

PDF

Open Access

TL;DR

This paper introduces ICPC, a novel framework that enhances multimodal alignment in semantic segmentation through dynamic prompting and contrastive learning, leading to improved performance across multiple datasets.

Contribution

The paper proposes an instance-conditioned prompting with contrastive learning framework that improves vision-text alignment for semantic segmentation, outperforming state-of-the-art methods.

Findings

01

ICPC achieves consistent improvements on ADE20K, COCO-Stuff10k, and ADE20K-Full datasets.

02

ICPC outperforms the state-of-the-art by 1.71%, 1.05%, and 1.41% mIoU with ResNet-50.

03

Dynamic prompting conditioned on image content enhances dense task performance.

Abstract

Modern supervised semantic segmentation methods are usually finetuned based on the supervised or self-supervised models pre-trained on ImageNet. Recent work shows that transferring the knowledge from CLIP to semantic segmentation via prompt learning can achieve promising performance. The performance boost comes from the feature enhancement with multimodal alignment, i.e., the dot product between vision and text embeddings. However, how to improve the multimodal alignment for better transfer performance in dense tasks remains underexplored. In this work, we focus on improving the quality of vision-text alignment from two aspects of prompting design and loss function, and present an instance-conditioned prompting with contrastive learning (ICPC) framework. First, compared with the static prompt designs, we reveal that dynamic prompting conditioned on image content can more efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsContrastive Learning · Focus · Contrastive Language-Image Pre-training