AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang,, Jianwu Dang

TL;DR
AIMDiT introduces a novel multimodal fusion framework for emotion recognition in conversations, leveraging dimension transformation and interaction networks to improve accuracy over state-of-the-art models.
Contribution
The paper presents a new multimodal fusion approach with a Modality Augmentation Network and a Modality Interaction Network, enhancing feature representation and interaction for ERC.
Findings
Achieved 2.34% improvement in Acc-7 metric.
Achieved 2.87% improvement in w-F1 metric.
Outperforms existing SOTA models on MELD dataset.
Abstract
Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features. Specifically, we design a Modality Augmentation Network which performs rich representation learning through dimension transformation of different modalities and parameter-efficient inception block. On the other hand, the Modality Interaction Network performs interaction fusion of extracted inter-modal features and intra-modal features. Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
