MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models
Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao

TL;DR
MetaTPT introduces a meta-learning approach that enhances vision-language models' ability to adapt to new domains at test time by learning dynamic augmentations and prompts, significantly improving robustness and generalization.
Contribution
It proposes a novel meta-learning framework that jointly learns data augmentations and prompts for better test-time adaptation in vision-language models.
Findings
Achieves state-of-the-art results on domain generalization benchmarks.
Improves robustness of CLIP-like models under domain shifts.
Demonstrates effectiveness of learned augmentations in test-time prompt tuning.
Abstract
Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by adapting prompts with fixed augmentations, which may falter in more challenging settings. In this work, we propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning. The auxiliary task dynamically learns parameterized augmentations for each sample, enabling more expressive transformations that capture essential features in target domains. MetaTPT adopts a dual-loop optimization paradigm: an inner loop learns a self-supervised task that generates informative views, while the outer loop performs prompt tuning by enforcing consistency across these views. By coupling augmentation learning with prompt tuning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
