Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Xinyang Liu; Dongsheng Wang; Bowei Fang; Miaoge Li; Zhibin Duan; Yishi; Xu; Bo Chen; Mingyuan Zhou

arXiv:2303.09100·cs.CV·July 2, 2024·5 cites

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Xinyang Liu, Dongsheng Wang, Bowei Fang, Miaoge Li, Zhibin Duan, Yishi, Xu, Bo Chen, Mingyuan Zhou

PDF

Open Access

TL;DR

This paper proposes a Bayesian prompt tuning method for vision-language models that generates label-specific stochastic prompts, improving diversity and generalization across various tasks and datasets.

Contribution

It introduces a hierarchical Bayesian framework with semantic regularization for prompt tuning, enhancing diversity and reducing overfitting in vision-language models.

Findings

01

Improves few-shot image recognition accuracy

02

Enhances generalization to new categories and datasets

03

Demonstrates strong transferability across 15 datasets

Abstract

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistical distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

Methodsfail