Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan; Jun Li; Yizhuang Zhou; Jun Wan; Zhen Lei; Xiangyu Zhang

arXiv:2312.06401·cs.CV·December 12, 2023·2 cites

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou, Jun Wan, Zhen Lei, Xiangyu Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces TGP-T, a novel prompt tuning method for vision-language models that reduces resource consumption and improves performance by using compound text supervision and visual feature conditioning.

Contribution

TGP-T leverages compound text supervision and visual feature conditioning to enhance prompt tuning efficiency and effectiveness, addressing limitations of previous methods.

Findings

01

Reduces GPU memory usage by 93%.

02

Achieves 2.5% performance gain on 16-shot ImageNet.

03

Outperforms existing prompt tuning methods in few-shot recognition and domain generalization.

Abstract

Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable generalization capabilities to downstream tasks. However, existing prompt tuning based frameworks need to parallelize learnable textual inputs for all categories, suffering from massive GPU memory consumption when there is a large number of categories in the target dataset. Moreover, previous works require to include category names within prompts, exhibiting subpar performance when dealing with ambiguous category names. To address these shortcomings, we propose Compound Text-Guided Prompt Tuning (TGP-T) that significantly reduces resource demand while achieving superior performance. We introduce text supervision to the optimization of prompts, which enables two benefits: 1) releasing the model reliance on the pre-defined category names during inference, thereby enabling more flexible prompt generation; 2) reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

erictan7/tgp-t
pytorchOfficial

Videos

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training