Unsupervised Prompt Learning for Vision-Language Models

Tony Huang; Jack Chu; Fangyun Wei

arXiv:2204.03649·cs.CV·August 23, 2022·54 cites

Unsupervised Prompt Learning for Vision-Language Models

Tony Huang, Jack Chu, Fangyun Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised prompt learning method for vision-language models like CLIP, eliminating the need for labeled data and prompt engineering, and achieving superior transfer performance across multiple datasets.

Contribution

It is the first to incorporate unsupervised learning into prompt learning, enhancing CLIP's transfer capabilities without requiring labeled target data.

Findings

01

Outperforms original CLIP with prompt engineering on ImageNet and 10 other datasets.

02

Competitive with 8-shot CoOp and TIP-Adapter methods.

03

Demonstrates effectiveness of unsupervised prompt learning in vision-language models.

Abstract

Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images. In order to avoid laborious prompt engineering, recent works such as CoOp, CLIP-Adapter and Tip-Adapter propose to adapt vision-language models for downstream image recognition tasks on a small set of labeled data. Though promising improvements are achieved, requiring labeled data from the target datasets may restrict the scalability. In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models. As far as we know, UPL is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tonyhuang2022/upl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsAdapter · Context Optimization · Contrastive Language-Image Pre-training