VPA: Fully Test-Time Visual Prompt Adaptation
Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao,, Cristian Canton Ferrer, Caner Hazirbas

TL;DR
VPA introduces a test-time visual prompt adaptation framework that uses learnable tokens to improve model robustness and generalization across out-of-distribution data, corruptions, and domain shifts without source data.
Contribution
It is the first framework to generalize visual prompting with test-time adaptation, enhancing robustness and domain adaptation in vision models.
Findings
VPA improves OOD generalization by 3.3%.
VPA enhances corruption robustness by 6.5%.
VPA boosts domain adaptation performance by 5.2%.
Abstract
Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
