Speech Aware Dialog System Technology Challenge (DSTC11)
Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey, Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda

TL;DR
This paper introduces a new speech-aware dialog state tracking challenge with a public corpus, aiming to bridge the gap between spoken and written dialog systems and evaluate models on speech input variations.
Contribution
It creates a comprehensive speech-based dataset for dialog systems, including TTS, human speech, and ASR outputs, facilitating research on spoken language processing.
Findings
Teams achieved varying performance levels on speech variants.
TTS-based data can approximate human speech in dialog tasks.
Current models still struggle with speech input variability.
Abstract
Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and spoken language. The research on this topic is stymied by the lack of a public corpus. Motivated by these considerations, our goal in hosting the speech-aware dialog state tracking challenge was to create a public corpus or task which can be used to investigate the performance gap between the written and spoken forms of input, develop models that could alleviate this gap, and establish whether Text-to-Speech-based (TTS) systems is a reasonable surrogate to the more-labor intensive human data collection. We created three spoken versions of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
