MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

Yuqing Lei; Yingjun Du; Yawen Huang; Xiantong Zhen; Ling Shao

arXiv:2512.12268·cs.CV·December 16, 2025

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao

PDF

Open Access

TL;DR

MetaTPT introduces a meta-learning approach that enhances vision-language models' ability to adapt to new domains at test time by learning dynamic augmentations and prompts, significantly improving robustness and generalization.

Contribution

It proposes a novel meta-learning framework that jointly learns data augmentations and prompts for better test-time adaptation in vision-language models.

Findings

01

Achieves state-of-the-art results on domain generalization benchmarks.

02

Improves robustness of CLIP-like models under domain shifts.

03

Demonstrates effectiveness of learned augmentations in test-time prompt tuning.

Abstract

Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by adapting prompts with fixed augmentations, which may falter in more challenging settings. In this work, we propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning. The auxiliary task dynamically learns parameterized augmentations for each sample, enabling more expressive transformations that capture essential features in target domains. MetaTPT adopts a dual-loop optimization paradigm: an inner loop learns a self-supervised task that generates informative views, while the outer loop performs prompt tuning by enforcing consistency across these views. By coupling augmentation learning with prompt tuning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis