SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Jingyu Lu; Yuhan Wang; Fan Zhuo; Xize Cheng; Changhao Pan; Xueyi Pu; Yifu Chen; Chenyuhao Wen; Tianle Liang; Zhou Zhao

arXiv:2603.14889·eess.AS·May 12, 2026

SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Jingyu Lu, Yuhan Wang, Fan Zhuo, Xize Cheng, Changhao Pan, Xueyi Pu, Yifu Chen, Chenyuhao Wen, Tianle Liang, Zhou Zhao

PDF

1 Repo

TL;DR

SDiaReward is a novel reward model for spoken dialogue systems that evaluates modality and colloquialness directly from speech, improving robustness and expressiveness assessment.

Contribution

It introduces SDiaReward, a multi-turn reward model trained on a new dataset, and establishes ESDR-Bench for comprehensive episode-level evaluation.

Findings

01

Achieves state-of-the-art preference accuracy.

02

Outperforms general-purpose audio LLMs.

03

Captures conversational expressiveness beyond superficial cues.

Abstract

The rapid evolution of end-to-end spoken dialogue systems demands transcending mere textual semantics to incorporate paralinguistic nuances and the spontaneous nature of human conversation. However, current methods struggle with two critical gaps: the modality gap, involving prosody and emotion, and the colloquialness gap, distinguishing written scripts from natural speech. To address these challenges, we introduce SDiaReward, an end-to-end multi-turn reward model trained on SDiaReward-Dataset, a novel collection of episode-level preference pairs explicitly targeting these gaps. It operates directly on full multi-turn speech episodes and is optimized with pairwise preference supervision, enabling joint assessment of modality and colloquialness in a single evaluator. We further establish ESDR-Bench, a stratified benchmark for robust episode-level evaluation. Experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MM-Speech/SDiaReward
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.