GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints

Jiajun He; Jinyi Mi; Tomoki Toda

arXiv:2506.00865·cs.AI·June 3, 2025

GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints

Jiajun He, Jinyi Mi, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces GIA-MIC, a novel multimodal emotion recognition framework that uses gated interactive attention and modality-invariant learning to improve feature extraction and cross-modal similarity alignment, leading to state-of-the-art results.

Contribution

It proposes a gated interactive attention mechanism and a modality-invariant generator to better extract features and align cross-modal representations in MER.

Findings

01

Outperforms state-of-the-art methods on IEMOCAP

02

Achieves WA 80.7% and UA 81.3%

03

Effectively captures modality-specific and cross-modal features

Abstract

Multimodal emotion recognition (MER) extracts emotions from multimodal data, including visual, speech, and text inputs, playing a key role in human-computer interaction. Attention-based fusion methods dominate MER research, achieving strong classification performance. However, two key challenges remain: effectively extracting modality-specific features and capturing cross-modal similarities despite distribution differences caused by modality heterogeneity. To address these, we propose a gated interactive attention mechanism to adaptively extract modality-specific features while enhancing emotional information through pairwise interactions. Additionally, we introduce a modality-invariant generator to learn modality-invariant representations and constrain domain shifts by aligning cross-modal similarities. Experiments on IEMOCAP demonstrate that our method outperforms state-of-the-art MER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition