Unsupervised Prototype Adapter for Vision-Language Models

Yi Zhang; Ce Zhang; Xueting Hu; Zhihai He

arXiv:2308.11507·cs.CV·August 28, 2023

Unsupervised Prototype Adapter for Vision-Language Models

Yi Zhang, Ce Zhang, Xueting Hu, Zhihai He

PDF

Open Access

TL;DR

This paper introduces an unsupervised fine-tuning method called Unsupervised Prototype Adapter (UP-Adapter) for vision-language models like CLIP, enabling effective downstream recognition without annotated data.

Contribution

The paper proposes a novel unsupervised approach to adapt vision-language models using automatically selected samples and class prototypes, eliminating the need for labeled data.

Findings

01

Outperforms 8-shot CoOp and Tip-Adapter in image recognition tasks.

02

Achieves superior results on domain generalization benchmarks.

03

Demonstrates effectiveness without requiring annotated datasets.

Abstract

Recently, large-scale pre-trained vision-language models (e.g. CLIP and ALIGN) have demonstrated remarkable effectiveness in acquiring transferable visual representations. To leverage the valuable knowledge encoded within these models for downstream tasks, several fine-tuning approaches, including prompt tuning methods and adapter-based methods, have been developed to adapt vision-language models effectively with supervision. However, these methods rely on the availability of annotated samples, which can be labor-intensive and time-consuming to acquire, thus limiting scalability. To address this issue, in this work, we design an unsupervised fine-tuning approach for vision-language models called Unsupervised Prototype Adapter (UP-Adapter). Specifically, for the unannotated target datasets, we leverage the text-image aligning capability of CLIP to automatically select the most confident…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsContext Optimization · Contrastive Language-Image Pre-training · Residual Connection · Adapter