Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning
Jungwon Choi, Eunwoo Kim

TL;DR
This paper introduces a dual-modality anchor-guided filtering framework for test-time prompt tuning, improving view selection and model adaptation under distribution shifts by grounding in semantic evidence.
Contribution
It proposes a novel anchor-guided approach using text and image anchors to enhance view filtering and model supervision during test-time prompt tuning.
Findings
Achieves state-of-the-art results on 15 benchmark datasets.
Improves robustness of prompt tuning under distribution shifts.
Enhances view selection accuracy through semantic grounding.
Abstract
Test-Time Prompt Tuning (TPT) adapts vision-language models using augmented views, but its effectiveness is hindered by the challenge of determining which views are beneficial. Standard entropy-based filtering relies on the internal confidence scores of the model, which are often miscalibrated under distribution shift, assigning high confidence to irrelevant crops or background regions while ignoring semantic content. To address this, we propose a dual-modality anchor-guided framework that grounds view selection in semantic evidence. We introduce a text anchor from attribute-rich descriptions, to provide fine-grained class semantics, and an adaptive image anchor that captures evolving test-time statistics. Using these anchors, we filter views based on alignment and confidence, ensuring that only informative views guide adaptation. Moreover, we treat the anchors as auxiliary predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
