IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

Soumya Suvra Ghosal; Samyadeep Basu; Soheil Feizi; Dinesh Manocha

arXiv:2406.13683·cs.CV·June 21, 2024

IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

Soumya Suvra Ghosal, Samyadeep Basu, Soheil Feizi, Dinesh Manocha

PDF

Open Access 1 Video

TL;DR

IntCoOp introduces an interpretable prompt-tuning method that incorporates compositional attributes to improve image-text alignment and few-shot learning performance in vision-language models.

Contribution

The paper proposes IntCoOp, a novel prompt-tuning approach that learns attribute-level inductive biases for better interpretability and improved performance over existing methods.

Findings

01

Outperforms state-of-the-art prompt tuning frameworks.

02

Improves average performance by 7.35% in 16-shot setting.

03

Enhances generalization to novel classes and domain shifts.

Abstract

Image-text contrastive models such as CLIP learn transferable and robust representations for zero-shot transfer to a variety of downstream tasks. However, to obtain strong downstream performances, prompts need to be carefully curated, which can be a tedious engineering task. To address the issue of manual prompt engineering, prompt-tuning is used where a set of contextual vectors are learned by leveraging information from the training data. Despite their effectiveness, existing prompt-tuning frameworks often lack interpretability, thus limiting their ability to understand the compositional nature of images. In this work, we first identify that incorporating compositional attributes (e.g., a "green" tree frog) in the design of manual prompts can significantly enhance image-text alignment scores. Building upon this observation, we propose a novel and interpretable prompt-tuning method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training · ALIGN · Contrastive Language-Image Pre-training · Context Optimization