VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models

Silin Cheng; Kai Han

arXiv:2511.22664·cs.CV·December 1, 2025

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models

Silin Cheng, Kai Han

PDF

Open Access 1 Video

TL;DR

VaMP introduces a variational prompt learning framework for vision-language models that personalizes prompts based on input uncertainty, improving few-shot and domain generalization performance.

Contribution

The paper presents a novel variational approach to multi-modal prompt tuning, incorporating instance-specific prompts and uncertainty modeling for better adaptation.

Findings

01

Achieves state-of-the-art results on few-shot benchmarks.

02

Effectively models uncertainty and instance variation.

03

Enhances domain generalization in vision-language tasks.

Abstract

Vision-language models (VLMs), such as CLIP, have shown strong generalization under zero-shot settings, yet adapting them to downstream tasks with limited supervision remains a significant challenge. Existing multi-modal prompt learning methods typically rely on fixed, shared prompts and deterministic parameters, which limits their ability to capture instance-level variation or model uncertainty across diverse tasks and domains. To tackle this issue, we propose a novel Variational Multi-Modal Prompt Learning (VaMP) framework that enables sample-specific, uncertainty-aware prompt tuning in multi-modal representation learning. VaMP generates instance-conditioned prompts by sampling from a learned posterior distribution, allowing the model to personalize its behavior based on input content. To further enhance the integration of local and global semantics, we introduce a class-aware prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis