MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality
Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang

TL;DR
This paper introduces MuAP, a novel multi-step adaptive prompt learning framework designed for vision-language models with missing modalities, improving robustness and performance in incomplete data scenarios.
Contribution
It is the first to explore prompt learning with incomplete modalities and proposes a multi-step adaptive approach to generate and tune multimodal prompts.
Findings
MuAP significantly outperforms state-of-the-art methods on benchmark datasets.
The framework effectively mitigates modality imbalance issues.
Adaptive prompt tuning enhances model robustness with missing modalities.
Abstract
Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the first comprehensive investigation into prompt learning behavior when modalities are incomplete, revealing the high sensitivity of prompt-based models to missing modalities. To this end, we propose a novel Multi-step Adaptive Prompt Learning (MuAP) framework, aiming to generate multimodal prompts and perform multi-step prompt tuning, which adaptively learns knowledge by iteratively aligning modalities. Specifically, we generate multimodal prompts for each modality and devise prompt strategies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Adam
