Learning to Prompt for Vision-Language Models

Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu

arXiv:2109.01134·cs.CV·October 7, 2022

Learning to Prompt for Vision-Language Models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

PDF

5 Repos 1 Models

TL;DR

This paper introduces CoOp, a simple learnable prompt method for vision-language models like CLIP, significantly reducing prompt engineering effort and improving downstream image recognition performance with minimal training data.

Contribution

Proposes Context Optimization (CoOp), a learnable prompt approach that adapts pre-trained vision-language models for various tasks without changing their parameters.

Findings

01

CoOp outperforms hand-crafted prompts with as few as one or two shots.

02

With 16 shots, CoOp achieves around 15% average gain over traditional prompts.

03

CoOp demonstrates strong domain generalization compared to zero-shot models.

Abstract

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming -- one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
tongyujun/Subspace_Prompting
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training · Context Optimization