Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations
Xinran Li, Xiaomao Fan, Qingyang Wu, Xiaojiang Peng, Ye Li

TL;DR
This paper introduces MaTAV, a novel multimodal alignment network that improves emotion recognition in conversations by ensuring data consistency across modalities and capturing long-term contextual information, outperforming existing methods.
Contribution
The paper proposes MaTAV, a new alignment network that effectively synchronizes unimodal features and captures contextual cues in lengthy dialogues for ERC.
Findings
MaTAV significantly outperforms state-of-the-art methods on MELD and IEMOCAP datasets.
Ensures consistency across text, audio, and visual modalities.
Effectively captures evolving emotional context in long conversations.
Abstract
Emotion Recognition in Conversations (ERCs) is a vital area within multimodal interaction research, dedicated to accurately identifying and classifying the emotions expressed by speakers throughout a conversation. Traditional ERC approaches predominantly rely on unimodal cues\-such as text, audio, or visual data\-leading to limitations in their effectiveness. These methods encounter two significant challenges: 1) Consistency in multimodal information. Before integrating various modalities, it is crucial to ensure that the data from different sources is aligned and coherent. 2) Contextual information capture. Successfully fusing multimodal features requires a keen understanding of the evolving emotional tone, especially in lengthy dialogues where emotions may shift and develop over time. To address these limitations, we propose a novel Mamba-enhanced Text-Audio-Video alignment network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Social Robot Interaction and HRI
