Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Md Kaykobad Reza, Ashley Prater-Bennette, M. Salman Asif

TL;DR
This paper introduces a parameter-efficient adaptation method for pretrained multimodal networks that enhances robustness to missing modalities by modulating intermediate features, outperforming existing approaches across diverse tasks and datasets.
Contribution
The authors propose a simple, parameter-efficient adaptation technique that improves robustness of multimodal models to missing data by modulating intermediate features, requiring less than 1% additional parameters.
Findings
Improves performance with missing modalities across five tasks.
Requires fewer than 1% of total parameters for adaptation.
Outperforms existing methods in robustness to missing modalities.
Abstract
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Music and Audio Processing · Speech Recognition and Synthesis
