MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with   Missing Modality

Ruiting Dai; Yuqiao Tan; Lisi Mo; Tao He; Ke Qin; Shuang Liang

arXiv:2409.04693·cs.AI·September 10, 2024

MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang

PDF

Open Access

TL;DR

This paper introduces MuAP, a novel multi-step adaptive prompt learning framework designed for vision-language models with missing modalities, improving robustness and performance in incomplete data scenarios.

Contribution

It is the first to explore prompt learning with incomplete modalities and proposes a multi-step adaptive approach to generate and tune multimodal prompts.

Findings

01

MuAP significantly outperforms state-of-the-art methods on benchmark datasets.

02

The framework effectively mitigates modality imbalance issues.

03

Adaptive prompt tuning enhances model robustness with missing modalities.

Abstract

Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the first comprehensive investigation into prompt learning behavior when modalities are incomplete, revealing the high sensitivity of prompt-based models to missing modalities. To this end, we propose a novel Multi-step Adaptive Prompt Learning (MuAP) framework, aiming to generate multimodal prompts and perform multi-step prompt tuning, which adaptively learns knowledge by iteratively aligning modalities. Specifically, we generate multimodal prompts for each modality and devise prompt strategies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Adam