VPA: Fully Test-Time Visual Prompt Adaptation

Jiachen Sun; Mark Ibrahim; Melissa Hall; Ivan Evtimov; Z. Morley Mao,; Cristian Canton Ferrer; Caner Hazirbas

arXiv:2309.15251·cs.CV·September 28, 2023

VPA: Fully Test-Time Visual Prompt Adaptation

Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao,, Cristian Canton Ferrer, Caner Hazirbas

PDF

Open Access

TL;DR

VPA introduces a test-time visual prompt adaptation framework that uses learnable tokens to improve model robustness and generalization across out-of-distribution data, corruptions, and domain shifts without source data.

Contribution

It is the first framework to generalize visual prompting with test-time adaptation, enhancing robustness and domain adaptation in vision models.

Findings

01

VPA improves OOD generalization by 3.3%.

02

VPA enhances corruption robustness by 6.5%.

03

VPA boosts domain adaptation performance by 5.2%.

Abstract

Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition