AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension   Transformation for Emotion Recognition in Conversations

Sheng Wu; Jiaxing Liu; Longbiao Wang; Dongxiao He; Xiaobao Wang,; Jianwu Dang

arXiv:2407.00743·cs.MM·July 2, 2024

AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang,, Jianwu Dang

PDF

Open Access

TL;DR

AIMDiT introduces a novel multimodal fusion framework for emotion recognition in conversations, leveraging dimension transformation and interaction networks to improve accuracy over state-of-the-art models.

Contribution

The paper presents a new multimodal fusion approach with a Modality Augmentation Network and a Modality Interaction Network, enhancing feature representation and interaction for ERC.

Findings

01

Achieved 2.34% improvement in Acc-7 metric.

02

Achieved 2.87% improvement in w-F1 metric.

03

Outperforms existing SOTA models on MELD dataset.

Abstract

Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features. Specifically, we design a Modality Augmentation Network which performs rich representation learning through dimension transformation of different modalities and parameter-efficient inception block. On the other hand, the Modality Interaction Network performs interaction fusion of extracted inter-modal features and intra-modal features. Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition