EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

Jingwen Liu; Kan Jen Cheng; Jiachen Lian; Akshay Anand; Rishi Jain; Faith Qiao; Robin Netzorg; Huang-Cheng Chou; Tingle Li; Guan-Ting Lin; Gopala Anumanchipalli

arXiv:2508.17623·cs.CL·August 27, 2025

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

Jingwen Liu, Kan Jen Cheng, Jiachen Lian, Akshay Anand, Rishi Jain, Faith Qiao, Robin Netzorg, Huang-Cheng Chou, Tingle Li, Guan-Ting Lin, Gopala Anumanchipalli

PDF

TL;DR

This paper introduces EMO-Reasoning, a comprehensive benchmark for evaluating emotional reasoning in spoken dialogue systems, utilizing a new dataset and metrics to identify emotional inconsistencies and improve system naturalness.

Contribution

It presents a novel benchmark and dataset for assessing emotional reasoning in dialogue systems, addressing the lack of holistic evaluation tools for emotion-aware interactions.

Findings

01

Effective detection of emotional inconsistencies in dialogue systems

02

Benchmark facilitates comparison of emotional reasoning capabilities

03

Provides insights for enhancing emotion-aware dialogue modeling

Abstract

Speech emotions play a crucial role in human-computer interaction, shaping engagement and context-aware communication. Despite recent advances in spoken dialogue systems, a holistic system for evaluating emotional reasoning is still lacking. To address this, we introduce EMO-Reasoning, a benchmark for assessing emotional coherence in dialogue systems. It leverages a curated dataset generated via text-to-speech to simulate diverse emotional states, overcoming the scarcity of emotional speech data. We further propose the Cross-turn Emotion Reasoning Score to assess the emotion transitions in multi-turn dialogues. Evaluating seven dialogue systems through continuous, categorical, and perceptual metrics, we show that our framework effectively detects emotional inconsistencies, providing insights for improving current dialogue systems. By releasing a systematic evaluation benchmark, we aim…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.