Mixture of Disentangled Experts with Missing Modalities for Robust Multimodal Sentiment Analysis
Xiang Li, Xiaoming Zhang, Dezhuang Miao, Xianfu Cheng, Dawei Li, Honggui Han, and Zhoujun Li

TL;DR
This paper introduces DERL, a novel framework that disentangles multimodal features into private and shared spaces, improving robustness and accuracy in multimodal sentiment analysis with missing or corrupted data.
Contribution
DERL employs hybrid experts and a multi-level reconstruction strategy to enhance the robustness of multimodal sentiment analysis under missing modality conditions.
Findings
DERL outperforms state-of-the-art methods on two benchmarks.
Achieves 2.47% accuracy improvement on MOSI with missing data.
Enhances robustness and expressiveness of multimodal representations.
Abstract
Multimodal Sentiment Analysis (MSA) integrates multiple modalities to infer human sentiment, but real-world noise often leads to missing or corrupted data. However, existing feature-disentangled methods struggle to handle the internal variations of heterogeneous information under uncertain missingness, making it difficult to learn effective multimodal representations from degraded modalities. To address this issue, we propose DERL, a Disentangled Expert Representation Learning framework for robust MSA. Specifically, DERL employs hybrid experts to adaptively disentangle multimodal inputs into orthogonal private and shared representation spaces. A multi-level reconstruction strategy is further developed to provide collaborative supervision, enhancing both the expressiveness and robustness of the learned representations. Finally, the disentangled features act as modality experts with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Generative Adversarial Networks and Image Synthesis
