Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu

arXiv:2203.05557·cs.CV·October 7, 2022·73 cites

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

PDF

Open Access 5 Repos 1 Models

TL;DR

This paper introduces CoCoOp, a dynamic prompt learning method for vision-language models that improves generalization to unseen classes and domains by generating input-conditional prompts, addressing overfitting issues of previous static prompt methods.

Contribution

The paper proposes Conditional Context Optimization (CoCoOp), a novel dynamic prompt learning approach that enhances generalization and transferability of vision-language models to unseen classes and domains.

Findings

01

CoCoOp outperforms CoOp on unseen classes within the same dataset.

02

CoCoOp demonstrates better transferability across different datasets.

03

The method improves domain generalization performance.

Abstract

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
tongyujun/Subspace_Prompting
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsContext Optimization · Balanced Selection · Contrastive Language-Image Pre-training