SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu,, Yuxuan Wang, Haizhou Li, Zhizheng Wu

TL;DR
SD-Eval is a comprehensive benchmark dataset designed to evaluate spoken dialogue understanding and generation, emphasizing paralinguistic and environmental information to improve model responses and evaluation metrics.
Contribution
The paper introduces SD-Eval, a novel open-source dataset for multidimensional spoken dialogue understanding, including paralinguistic and environmental data, and demonstrates its effectiveness in model evaluation.
Findings
Models conditioned on paralinguistic info outperform others.
LLM-based metrics correlate better with human judgments.
SD-Eval enhances evaluation of speech-based dialogue systems.
Abstract
Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
