Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion   Recognition in Conversations

Xinran Li; Xiaomao Fan; Qingyang Wu; Xiaojiang Peng; Ye Li

arXiv:2409.05243·cs.CV·September 10, 2024·2 cites

Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations

Xinran Li, Xiaomao Fan, Qingyang Wu, Xiaojiang Peng, Ye Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MaTAV, a novel multimodal alignment network that improves emotion recognition in conversations by ensuring data consistency across modalities and capturing long-term contextual information, outperforming existing methods.

Contribution

The paper proposes MaTAV, a new alignment network that effectively synchronizes unimodal features and captures contextual cues in lengthy dialogues for ERC.

Findings

01

MaTAV significantly outperforms state-of-the-art methods on MELD and IEMOCAP datasets.

02

Ensures consistency across text, audio, and visual modalities.

03

Effectively captures evolving emotional context in long conversations.

Abstract

Emotion Recognition in Conversations (ERCs) is a vital area within multimodal interaction research, dedicated to accurately identifying and classifying the emotions expressed by speakers throughout a conversation. Traditional ERC approaches predominantly rely on unimodal cues\-such as text, audio, or visual data\-leading to limitations in their effectiveness. These methods encounter two significant challenges: 1) Consistency in multimodal information. Before integrating various modalities, it is crucial to ensure that the data from different sources is aligned and coherent. 2) Contextual information capture. Successfully fusing multimodal features requires a keen understanding of the evolving emotional tone, especially in lengthy dialogues where emotions may shift and develop over time. To address these limitations, we propose a novel Mamba-enhanced Text-Audio-Video alignment network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Alena-Xinran/MaTAV
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Social Robot Interaction and HRI