Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models
Fei Song, Yi Li, Rui Wang, Jiahuan Zhou, Changwen Zheng, Jiangmeng Li

TL;DR
This paper introduces a novel test-time prompt tuning method for vision-language models that reduces bias by leveraging dynamic knowledge retrieval and confidence-based regularization, improving generalization across diverse datasets.
Contribution
The paper proposes a doubly debiased prompt tuning approach combining knowledge retrieval and confidence-aware regularization to mitigate prompt optimization bias in vision-language models.
Findings
Outperforms baselines on 15 benchmark datasets.
Effectively reduces prompt optimization bias.
Enhances generalization under distribution shifts.
Abstract
Test-time prompt tuning for vision-language models has demonstrated impressive generalization capabilities under zero-shot settings. However, tuning the learnable prompts solely based on unlabeled test data may induce prompt optimization bias, ultimately leading to suboptimal performance on downstream tasks. In this work, we analyze the underlying causes of prompt optimization bias from both the model and data perspectives. In terms of the model, the entropy minimization objective typically focuses on reducing the entropy of model predictions while overlooking their correctness. This can result in overconfident yet incorrect outputs, thereby compromising the quality of prompt optimization. On the data side, prompts affected by optimization bias can introduce misalignment between visual and textual modalities, which further aggravates the prompt optimization bias. To this end, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
