Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation

Zijian Yi; Ziming Zhao; Zhishu Shen; Tiehua Zhang

arXiv:2408.00970·cs.MM·July 25, 2025

Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation

Zijian Yi, Ziming Zhao, Zhishu Shen, Tiehua Zhang

PDF

Open Access

TL;DR

This paper introduces a novel multimodal emotion recognition framework that uses a variational hypergraph autoencoder and contrastive learning to improve context modeling and fusion, outperforming existing methods on benchmark datasets.

Contribution

It proposes a dynamic hypergraph connection adjustment via VHGAE and incorporates contrastive learning to enhance multimodal emotion recognition accuracy.

Findings

01

Outperforms state-of-the-art on IEMOCAP and MELD datasets.

02

Effectively models long-distance conversational context.

03

Reduces redundancy and smoothing in hypergraph-based context modeling.

Abstract

Multimodal emotion recognition in conversation (MERC) seeks to identify the speakers' emotions expressed in each utterance, offering significant potential across diverse fields. The challenge of MERC lies in balancing speaker modeling and context modeling, encompassing both long-distance and short-distance contexts, as well as addressing the complexity of multimodal information fusion. Recent research adopts graph-based methods to model intricate conversational relationships effectively. Nevertheless, the majority of these methods utilize a fixed fully connected structure to link all utterances, relying on convolution to interpret complex context. This approach can inherently heighten the redundancy in contextual messages and excessive graph network smoothing, particularly in the context of long-distance conversations. To address this issue, we propose a framework that dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition