Towards Robust Multimodal Prompting With Missing Modalities
Jaehyuk Jang, Yooseung Wang, Changick Kim

TL;DR
This paper introduces a new prompt design for multimodal models that improves robustness and reduces complexity by using modality-specific tokens with orthogonality constraints, addressing issues of prompt growth and robustness in missing modality scenarios.
Contribution
It proposes a simple prompt design using modality-specific tokens with orthogonality constraints, improving robustness and reducing prompt count in multimodal prompting.
Findings
Enhanced performance and robustness in multimodal tasks
Reduced number of prompts needed for effective prompting
Better handling of missing modalities during inference
Abstract
Recently, multimodal prompting, which introduces learnable missing-aware prompts for all missing modality cases, has exhibited impressive performance. However, it encounters two critical issues: 1) The number of prompts grows exponentially as the number of modalities increases; and 2) It lacks robustness in scenarios with different missing modality settings between training and inference. In this paper, we propose a simple yet effective prompt design to address these challenges. Instead of using missing-aware prompts, we utilize prompts as modality-specific tokens, enabling them to capture the unique characteristics of each modality. Furthermore, our prompt design leverages orthogonality between prompts as a key element to learn distinct information across different modalities and promote diversity in the learned representations. Extensive experiments demonstrate that our prompt design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Speech and Audio Processing
