Distribution-Aware Prompt Tuning for Vision-Language Models
Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim

TL;DR
This paper introduces distribution-aware prompt tuning (DAPT) for vision-language models, which enhances feature space alignment by optimizing class dispersion, leading to improved generalization across multiple benchmark datasets.
Contribution
The paper proposes a novel distribution-aware prompt tuning method that maximizes inter-class dispersion and minimizes intra-class dispersion to improve vision-language model performance.
Findings
Significant performance improvements on 11 benchmark datasets.
Enhanced feature space alignment through distribution-aware prompt learning.
Better generalization compared to existing prompt tuning methods.
Abstract
Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed. We observed that the alignment becomes more effective when embeddings of each modality are `well-arranged' in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Distribution-Aware Prompt Tuning for Vision-Language Models· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
