Deep Correlated Prompting for Visual Recognition with Missing Modalities
Lianyu Hu, Tongkai Shi, Wei Feng, Fanhua Shang, Liang Wan

TL;DR
This paper introduces a novel prompt learning approach that leverages correlations between prompts and input features to improve multimodal visual recognition performance when some modalities are missing, addressing real-world data collection challenges.
Contribution
It proposes a correlation-aware prompt learning method that effectively handles missing modalities by exploiting inter-prompt and layer relationships, enhancing robustness of pretrained multimodal models.
Findings
Outperforms previous methods across multiple datasets and missing scenarios.
Demonstrates robustness across different modality-missing ratios and types.
Shows generalizability and reliability through extensive ablation studies.
Abstract
Large-scale multimodal models have shown excellent performance over a series of tasks powered by the large corpus of paired multimodal training data. Generally, they are always assumed to receive modality-complete inputs. However, this simple assumption may not always hold in the real world due to privacy constraints or collection difficulty, where models pretrained on modality-complete data easily demonstrate degraded performance on missing-modality cases. To handle this issue, we refer to prompt learning to adapt large pretrained multimodal models to handle missing-modality scenarios by regarding different missing cases as different types of input. Instead of only prepending independent prompts to the intermediate layers, we present to leverage the correlations between prompts and input features and excavate the relationships between different layers of prompts to carefully design the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Image Processing Techniques and Applications
