XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning
Hanwen Zhang, Yao Liu, Peiyuan Jiang, Lang Junjie, Xie Jun, Yihui He, Yajiao Deng, Siyu Du, and Qiao Liu

TL;DR
XEmoGPT is a novel framework that improves multimodal emotion recognition by perceiving and reasoning over emotional cues, supported by a new large-scale dataset and specialized evaluation metrics.
Contribution
It introduces XEmoGPT with cue-level perception and reasoning modules, along with EmoCue datasets and EmoCue-360 metric for enhanced emotional cue analysis.
Findings
XEmoGPT outperforms existing models in cue perception and reasoning.
The EmoCue dataset enables effective training for cue-level emotion understanding.
EmoCue-360 provides a reliable automated evaluation of emotional cue reasoning.
Abstract
Explainable Multimodal Emotion Recognition plays a crucial role in applications such as human-computer interaction and social media analytics. However, current approaches struggle with cue-level perception and reasoning due to two main challenges: 1) general-purpose modality encoders are pretrained to capture global structures and general semantics rather than fine-grained emotional cues, resulting in limited sensitivity to emotional signals; and 2) available datasets usually involve a trade-off between annotation quality and scale, which leads to insufficient supervision for emotional cues and ultimately limits cue-level reasoning. Moreover, existing evaluation metrics are inadequate for assessing cue-level reasoning performance. To address these challenges, we propose eXplainable Emotion GPT (XEmoGPT), a novel EMER framework capable of both perceiving and reasoning over emotional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
