Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language   Modeling

Renrui Zhang; Rongyao Fang; Wei Zhang; Peng Gao; Kunchang Li; Jifeng; Dai; Yu Qiao; Hongsheng Li

arXiv:2111.03930·cs.CV·November 16, 2021·128 cites

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Renrui Zhang, Rongyao Fang, Wei Zhang, Peng Gao, Kunchang Li, Jifeng, Dai, Yu Qiao, Hongsheng Li

PDF

Open Access 1 Repo

TL;DR

Tip-Adapter is a training-free, efficient method that constructs a visual feature adapter for CLIP using a cache model from few-shot data, achieving strong performance without additional training.

Contribution

It introduces a training-free, cache-based adapter for CLIP that enhances few-shot learning without extra training or computational cost.

Findings

01

Outperforms CLIP-Adapter in few-shot classification tasks.

02

Achieves comparable or better results without training.

03

Fine-tuning the adapter further improves performance.

Abstract

Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations by using large-scale contrastive image-text pairs. It shows impressive performance on zero-shot knowledge transfer to downstream tasks. To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification. However, such a process still needs extra training and computational resources. In this paper, we propose \textbf{T}raining-Free CL\textbf{IP}-\textbf{Adapter} (\textbf{Tip-Adapter}), which not only inherits CLIP's training-free advantage but also performs comparably or even better than CLIP-Adapter. Tip-Adapter does not require any back propagation for training the adapter, but creates the weights by a key-value cache model constructed from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gaopengcuhk/tip-adapter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsAdapter · Contrastive Language-Image Pre-training