Robust Multimodal Learning via Representation Decoupling
Shicai Wei, Yang Luo, Yuji Wang, Chunbo Luo

TL;DR
This paper introduces DMRNet, a novel multimodal learning model that models input as probabilistic distributions to better capture modality-specific information and improve robustness to missing modalities.
Contribution
The paper proposes DMRNet, which models multimodal inputs as distributions and uses sampling to relax constraints, enhancing modality-specific learning and robustness.
Findings
DMRNet outperforms state-of-the-art methods on classification tasks.
The probabilistic approach improves robustness to missing modalities.
The hard combination regularizer balances training across modality combinations.
Abstract
Multimodal learning robust to missing modality has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. However, we reveal that they are sub-optimal due to their implicit constraint on intra-class representation. Specifically, the sample with different modalities within the same class will be forced to learn representations in the same direction. This hinders the model from capturing modality-specific information, resulting in insufficient learning. To this end, we propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning. Specifically, DMRNet models the input from different modality combinations as a probabilistic distribution instead of a fixed point in the latent space, and samples embeddings from the distribution for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsSoftmax · Attention Is All You Need
