SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
Sijie Li, Chen Chen, Jungong Han

TL;DR
SimMLM introduces a simple, adaptable framework for multimodal learning that maintains high accuracy even when some modalities are missing, using a dynamic gating mechanism and a novel ranking loss.
Contribution
It proposes a generic DMoME architecture with a MoFe ranking loss, improving robustness and accuracy in multimodal learning with missing data scenarios.
Findings
Outperforms existing methods on medical image segmentation and classification tasks.
Maintains or improves accuracy with missing modalities.
Demonstrates robustness and interpretability across diverse datasets.
Abstract
In this paper, we propose SimMLM, a simple yet powerful framework for multimodal learning with missing modalities. Unlike existing approaches that rely on sophisticated network architectures or complex data imputation techniques, SimMLM provides a generic and effective solution that can adapt to various missing modality scenarios with improved accuracy and robustness. Specifically, SimMLM consists of a generic Dynamic Mixture of Modality Experts (DMoME) architecture, featuring a dynamic, learnable gating mechanism that automatically adjusts each modality's contribution in both full and partial modality settings. A key innovation of SimMLM is the proposed More vs. Fewer (MoFe) ranking loss, which ensures that task accuracy improves or remains stable as more modalities are made available. This aligns the model with an intuitive principle: removing one or more modalities should not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
