Calibrating Multimodal Consensus for Emotion Recognition
Guowei Zhong, Junjie Li, Huaiyu Zhu, Ruohong Huan, Yun Pan

TL;DR
This paper introduces Calibrated Multimodal Consensus (CMC), a novel model for emotion recognition that addresses semantic inconsistencies and modality dominance issues, achieving superior performance across multiple datasets.
Contribution
The paper proposes CMC, which uses pseudo unimodal labels and a consensus-guided fusion process to improve multimodal emotion recognition accuracy.
Findings
CMC outperforms state-of-the-art methods on four datasets.
It shows robustness in scenarios with semantic inconsistencies.
The approach mitigates text modality dominance.
Abstract
In recent years, Multimodal Emotion Recognition (MER) has made substantial progress. Nevertheless, most existing approaches neglect the semantic inconsistencies that may arise across modalities, such as conflicting emotional cues between text and visual inputs. Besides, current methods are often dominated by the text modality due to its strong representational capacity, which can compromise recognition accuracy. To address these challenges, we propose a model termed Calibrated Multimodal Consensus (CMC). CMC introduces a Pseudo Label Generation Module (PLGM) to produce pseudo unimodal labels, enabling unimodal pretraining in a self-supervised fashion. It then employs a Parameter-free Fusion Module (PFM) and a Multimodal Consensus Router (MCR) for multimodal finetuning, thereby mitigating text dominance and guiding the fusion process toward a more reliable consensus. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
