In-context Prompt Learning for Test-time Vision Recognition with Frozen   Vision-language Model

Junhui Yin; Xinyu Zhang; Lin Wu; Xiaojie Wang

arXiv:2403.06126·cs.CV·August 20, 2024·1 cites

In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model

Junhui Yin, Xinyu Zhang, Lin Wu, Xiaojie Wang

PDF

Open Access

TL;DR

This paper introduces In-Context Prompt Learning (InCPL), a method that enables pre-trained vision-language models like CLIP to adapt to new tasks at test time using minimal labeled examples and unsupervised prompt tuning.

Contribution

InCPL leverages in-context learning principles for visual recognition, employing a language-to-vision translator and a cyclic prompt optimization strategy for effective test-time adaptation.

Findings

01

Achieves state-of-the-art results on multiple datasets.

02

Effectively adapts to new tasks with very few labeled examples.

03

Outperforms existing test-time adaptation methods.

Abstract

Current pre-trained vision-language models, such as CLIP, have demonstrated remarkable zero-shot generalization capabilities across various downstream tasks. However, their performance significantly degrades when test inputs exhibit different distributions. In this paper, we explore the concept of test-time prompt tuning (TTPT), which facilitates the adaptation of the CLIP model to novel downstream tasks through a one-step unsupervised optimization that involves only test samples. Inspired by in-context learning in natural language processing (NLP), we propose In-Context Prompt Learning (InCPL) for test-time visual recognition tasks, which empowers a pre-trained vision-language model with labeled examples as context information on downstream task. Specifically, InCPL associates a new test sample with very few labeled examples (sometimes just one) as context information, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training