HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Shuiyuan Wang; Zhixian Zhao; Hongfei Xue; Chengyou Wang; Shuai Wang; Hui Bu; Xin Xu; Lei Xie

arXiv:2604.11594·eess.AS·April 27, 2026

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Shuiyuan Wang, Zhixian Zhao, Hongfei Xue, Chengyou Wang, Shuai Wang, Hui Bu, Xin Xu, Lei Xie

PDF

TL;DR

HumDial-EIBench is a new benchmark using real human dialogues to evaluate audio language models' emotional intelligence across multi-turn interactions and causal reasoning, addressing limitations of prior synthesized speech benchmarks.

Contribution

It introduces a comprehensive, multi-turn, real-recorded dialogue benchmark with novel tasks to assess emotional tracking, causal reasoning, and robustness in audio language models.

Findings

01

Most models struggle with multi-turn emotional tracking.

02

Models show decoupled textual and acoustic empathy.

03

Severe text-dominance bias observed during conflicts.

Abstract

Evaluating the emotional intelligence (EI) of audio language models (ALMs) is critical. However, existing benchmarks mostly rely on synthesized speech, are limited to single-turn interactions, and depend heavily on open-ended scoring. This paper proposes HumDial-EIBench, a comprehensive benchmark for evaluating ALMs' EI. Using real-recorded human dialogues from the ICASSP 2026 HumDial Challenge, it reformulates emotional tracking and causal reasoning into multiple-choice questions with adversarial distractors, mitigating subjective scoring bias for cognitive tasks. It retains the generation of empathetic responses and introduces an acoustic-semantic conflict task to assess robustness against contradictory multimodal signals. Evaluations of eight ALMs reveal that most models struggle with multi-turn emotional tracking and implicit causal reasoning. Furthermore, all models exhibit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.