Enhance-then-Balance Modality Collaboration for Robust Multimodal Sentiment Analysis
Kang He, Yuzhe Ding, Xinrong Wang, Fei Li, Chong Teng, Donghong Ji

TL;DR
This paper introduces EBMC, a novel framework for multimodal sentiment analysis that enhances weaker modalities, balances modality contributions, and improves robustness against noise and missing data.
Contribution
The paper proposes a new model that combines semantic disentanglement, energy-guided coordination, and trust distillation to improve multimodal fusion and robustness.
Findings
EBMC achieves state-of-the-art results on benchmark datasets.
It maintains strong performance even with missing modalities.
The approach effectively balances modality contributions and enhances weaker signals.
Abstract
Multimodal sentiment analysis (MSA) integrates heterogeneous text, audio, and visual signals to infer human emotions. While recent approaches leverage cross-modal complementarity, they often struggle to fully utilize weaker modalities. In practice, dominant modalities tend to overshadow non-verbal ones, inducing modality competition and limiting overall contributions. This imbalance degrades fusion performance and robustness under noisy or missing modalities. To address this, we propose a novel model, Enhance-then-Balance Modality Collaboration framework (EBMC). EBMC improves representation quality via semantic disentanglement and cross-modal enhancement, strengthening weaker modalities. To prevent dominant modalities from overwhelming others, an Energy-guided Modality Coordination mechanism achieves implicit gradient rebalancing via a differentiable equilibrium objective. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
