In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model
Junhui Yin, Xinyu Zhang, Lin Wu, Xiaojie Wang

TL;DR
This paper introduces In-Context Prompt Learning (InCPL), a method that enables pre-trained vision-language models like CLIP to adapt to new tasks at test time using minimal labeled examples and unsupervised prompt tuning.
Contribution
InCPL leverages in-context learning principles for visual recognition, employing a language-to-vision translator and a cyclic prompt optimization strategy for effective test-time adaptation.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively adapts to new tasks with very few labeled examples.
Outperforms existing test-time adaptation methods.
Abstract
Current pre-trained vision-language models, such as CLIP, have demonstrated remarkable zero-shot generalization capabilities across various downstream tasks. However, their performance significantly degrades when test inputs exhibit different distributions. In this paper, we explore the concept of test-time prompt tuning (TTPT), which facilitates the adaptation of the CLIP model to novel downstream tasks through a one-step unsupervised optimization that involves only test samples. Inspired by in-context learning in natural language processing (NLP), we propose In-Context Prompt Learning (InCPL) for test-time visual recognition tasks, which empowers a pre-trained vision-language model with labeled examples as context information on downstream task. Specifically, InCPL associates a new test sample with very few labeled examples (sometimes just one) as context information, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsContrastive Language-Image Pre-training
