C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Chengqian Ma, Wei Tao, Yiwen Guo

TL;DR
This paper introduces C3, a bilingual benchmark dataset and evaluation method for spoken dialogue models, addressing challenges like ambiguity and context-dependency in complex human conversations to improve their practical effectiveness.
Contribution
It provides a new bilingual benchmark dataset and an LLM-based evaluation approach specifically designed for assessing spoken dialogue models in complex conversational scenarios.
Findings
Benchmark dataset with 1,079 instances in English and Chinese
Evaluation method aligned with human judgment
Insights into SDMs' ability to handle ambiguity and context
Abstract
Spoken Dialogue Models (SDMs) have recently attracted significant attention for their ability to generate voice responses directly to users' spoken queries. Despite their increasing popularity, there exists a gap in research focused on comprehensively understanding their practical effectiveness in comprehending and emulating human conversations. This is especially true compared to text-based Large Language Models (LLMs), which benefit from extensive benchmarking. Human voice interactions are inherently more complex than text due to characteristics unique to spoken dialogue. Ambiguity poses one challenge, stemming from semantic factors like polysemy, as well as phonological aspects such as heterograph, heteronyms, and stress patterns. Additionally, context-dependency, like omission, coreference, and multi-turn interaction, adds further complexity to human conversational dynamics. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
