Distribution-Aware Prompt Tuning for Vision-Language Models

Eulrang Cho; Jooyeon Kim; Hyunwoo J. Kim

arXiv:2309.03406·cs.CV·September 8, 2023

Distribution-Aware Prompt Tuning for Vision-Language Models

Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces distribution-aware prompt tuning (DAPT) for vision-language models, which enhances feature space alignment by optimizing class dispersion, leading to improved generalization across multiple benchmark datasets.

Contribution

The paper proposes a novel distribution-aware prompt tuning method that maximizes inter-class dispersion and minimizes intra-class dispersion to improve vision-language model performance.

Findings

01

Significant performance improvements on 11 benchmark datasets.

02

Enhanced feature space alignment through distribution-aware prompt learning.

03

Better generalization compared to existing prompt tuning methods.

Abstract

Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed. We observed that the alignment becomes more effective when embeddings of each modality are `well-arranged' in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvlab/dapt
pytorchOfficial

Videos

Distribution-Aware Prompt Tuning for Vision-Language Models· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling