Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, Wangmeng Zuo

TL;DR
This paper introduces DiffTPT, a novel test-time prompt tuning method that uses pre-trained diffusion models to generate diverse data, enhancing adaptation to unseen domains and improving zero-shot accuracy in vision-language models.
Contribution
The paper proposes DiffTPT, which leverages diffusion models for diverse data augmentation and introduces a cosine similarity filter to improve test-time prompt tuning.
Findings
DiffTPT achieves an average 5.13% improvement in zero-shot accuracy.
Incorporates both conventional and diffusion-based data augmentation.
Uses cosine similarity filtering to select high-fidelity generated data.
Abstract
Benefiting from prompt tuning, recent years have witnessed the promising performance of pre-trained vision-language models, e.g., CLIP, on versatile downstream tasks. In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT). Existing TPT methods typically rely on data augmentation and confidence selection. However, conventional data augmentation techniques, e.g., random resized crops, suffers from the lack of data diversity, while entropy-based confidence selection alone is not sufficient to guarantee prediction fidelity. To address these issues, we propose a novel TPT method, named DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data. Specifically, we incorporate augmented data by both conventional method and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
MethodsFocus · Diffusion · Contrastive Language-Image Pre-training
