SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond   Words

Junyi Ao; Yuancheng Wang; Xiaohai Tian; Dekun Chen; Jun Zhang; Lu Lu,; Yuxuan Wang; Haizhou Li; Zhizheng Wu

arXiv:2406.13340·cs.CL·January 17, 2025·1 cites

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu,, Yuxuan Wang, Haizhou Li, Zhizheng Wu

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

SD-Eval is a comprehensive benchmark dataset designed to evaluate spoken dialogue understanding and generation, emphasizing paralinguistic and environmental information to improve model responses and evaluation metrics.

Contribution

The paper introduces SD-Eval, a novel open-source dataset for multidimensional spoken dialogue understanding, including paralinguistic and environmental data, and demonstrates its effectiveness in model evaluation.

Findings

01

Models conditioned on paralinguistic info outperform others.

02

LLM-based metrics correlate better with human judgments.

03

SD-Eval enhances evaluation of speech-based dialogue systems.

Abstract

Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amphionspace/sd-eval
noneOfficial

Datasets

amphion/SD-Eval
dataset· 69 dl
69 dl

Videos

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words· slideslive

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training