GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints
Jiajun He, Jinyi Mi, Tomoki Toda

TL;DR
This paper introduces GIA-MIC, a novel multimodal emotion recognition framework that uses gated interactive attention and modality-invariant learning to improve feature extraction and cross-modal similarity alignment, leading to state-of-the-art results.
Contribution
It proposes a gated interactive attention mechanism and a modality-invariant generator to better extract features and align cross-modal representations in MER.
Findings
Outperforms state-of-the-art methods on IEMOCAP
Achieves WA 80.7% and UA 81.3%
Effectively captures modality-specific and cross-modal features
Abstract
Multimodal emotion recognition (MER) extracts emotions from multimodal data, including visual, speech, and text inputs, playing a key role in human-computer interaction. Attention-based fusion methods dominate MER research, achieving strong classification performance. However, two key challenges remain: effectively extracting modality-specific features and capturing cross-modal similarities despite distribution differences caused by modality heterogeneity. To address these, we propose a gated interactive attention mechanism to adaptively extract modality-specific features while enhancing emotional information through pairwise interactions. Additionally, we introduce a modality-invariant generator to learn modality-invariant representations and constrain domain shifts by aligning cross-modal similarities. Experiments on IEMOCAP demonstrate that our method outperforms state-of-the-art MER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
