MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences
Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

TL;DR
MM-Align introduces a novel optimal transport-based method for fast, accurate inference in multimodal tasks with missing modalities, improving imputation and reducing overfitting.
Contribution
It proposes a new alignment dynamics learning approach using optimal transport and a denoising training algorithm for missing modality inference.
Findings
Outperforms previous methods in accuracy and speed
Effective in various missing modality scenarios
Reduces overfitting in multimodal inference
Abstract
Existing multimodal tasks mostly target at the complete input modality setting, i.e., each modality is either complete or completely missing in both training and test sets. However, the randomly missing situations have still been underexplored. In this paper, we present a novel approach named MM-Align to address the missing-modality inference problem. Concretely, we propose 1) an alignment dynamics learning module based on the theory of optimal transport (OT) for indirect missing data imputation; 2) a denoising training algorithm to simultaneously enhance the imputation results and backbone network performance. Compared with previous methods which devote to reconstructing the missing inputs, MM-Align learns to capture and imitate the alignment dynamics between modality sequences. Results of comprehensive experiments on three datasets covering two multimodal tasks empirically demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Multimodal Machine Learning Applications
MethodsTest
