Which is Making the Contribution: Modulating Unimodal and Cross-modal Dynamics for Multimodal Sentiment Analysis
Ying Zeng, Sijie Mai, Haifeng Hu

TL;DR
This paper introduces M^3SA, a novel framework for multimodal sentiment analysis that modulates unimodal learning and filters noise to improve cross-modal dynamics and overall accuracy.
Contribution
The paper proposes a new MSA framework that explicitly modulates unimodal contributions and filters modality noise, addressing key limitations of previous methods.
Findings
Achieves state-of-the-art performance on public datasets.
Effectively filters noisy modality information.
Improves learning of unimodal and cross-modal dynamics.
Abstract
Multimodal sentiment analysis (MSA) draws increasing attention with the availability of multimodal data. The boost in performance of MSA models is mainly hindered by two problems. On the one hand, recent MSA works mostly focus on learning cross-modal dynamics, but neglect to explore an optimal solution for unimodal networks, which determines the lower limit of MSA models. On the other hand, noisy information hidden in each modality interferes the learning of correct cross-modal dynamics. To address the above-mentioned problems, we propose a novel MSA framework \textbf{M}odulation \textbf{M}odel for \textbf{M}ultimodal \textbf{S}entiment \textbf{A}nalysis ({}) to identify the contribution of modalities and reduce the impact of noisy information, so as to better learn unimodal and cross-modal dynamics. Specifically, modulation loss is designed to modulate the loss contribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Multimodal Machine Learning Applications
