Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Tianrui Wang; Ziyang Ma; Yizhou Peng; Haoyu Wang; Zhikang Niu; Zikang Huang; Yihao Wu; Yi-Wen Chao; Yu Jiang; Yuheng Lu; Guanrou Yang; Xuanchen Li; Hexin Liu; Chunyu Qiang; Cheng Gong; Yifan Yang; Tianchi Liu; Junyu Wang; Nana Hou; Meng Ge; Fuming You; Wei Yang; Zhongqian Sun; Haifeng Hu; Xiaobao Wang; Eng Siong Chng; Xie Chen; Longbiao Wang; Jianwu Dang

arXiv:2605.09413·eess.AS·May 12, 2026

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Tianrui Wang, Ziyang Ma, Yizhou Peng, Haoyu Wang, Zhikang Niu, Zikang Huang, Yihao Wu, Yi-Wen Chao, Yu Jiang, Yuheng Lu, Guanrou Yang, Xuanchen Li, Hexin Liu, Chunyu Qiang, Cheng Gong, Yifan Yang, Tianchi Liu, Junyu Wang, Nana Hou, Meng Ge, Fuming You, Wei Yang, Zhongqian Sun

PDF

TL;DR

This paper introduces CEAEval, a novel framework and dataset for evaluating whether speech expressively matches its narrative context, addressing limitations of existing emotion-focused methods.

Contribution

It presents CEAEval, a comprehensive context-rich evaluation framework and the first Mandarin speech dataset with annotations for expressive appropriateness.

Findings

01

CEAEval-M outperforms existing speech evaluation systems.

02

CEAEval-D is the first dataset with narrative context and expressive annotations in Mandarin.

03

The framework effectively assesses speech appropriateness in narrative contexts.

Abstract

Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.