Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition
Jiang Li, Xiaoping Wang, Zhigang Zeng

TL;DR
GraphSmile is a novel multimodal emotion recognition approach that uses graph structures and sentiment dynamics modeling to better capture emotional cues and improve accuracy in dialogue analysis.
Contribution
It introduces a dual-module framework that effectively models inter- and intra-modal dependencies and explicitly captures sentiment shifts, advancing multimodal emotion recognition.
Findings
Significantly outperforms baseline models on multiple benchmarks.
Effectively captures complex emotional and sentimental patterns.
Handles abrupt sentiment shifts with high accuracy.
Abstract
Multimodal emotion recognition in conversation (MERC) has garnered substantial research attention recently. Existing MERC methods face several challenges: (1) they fail to fully harness direct inter-modal cues, possibly leading to less-than-thorough cross-modal modeling; (2) they concurrently extract information from the same and different modalities at each network layer, potentially triggering conflicts from the fusion of multi-source data; (3) they lack the agility required to detect dynamic sentimental changes, perhaps resulting in inaccurate classification of utterances with abrupt sentiment shifts. To address these issues, a novel approach named GraphSmile is proposed for tracking intricate emotional cues in multimodal dialogues. GraphSmile comprises two key components, i.e., GSF and SDP modules. GSF ingeniously leverages graph structures to alternately assimilate inter-modal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
MethodsSoftmax · Attention Is All You Need
