StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering Scenario
Marcely Zanon Boito, Caroline Brun, Inyoung Kim, Denys Proux, Salah Ait-Mokhtar, Nikolaos Lagos, Jean-Luc Meunier, Ioan Calapodescu

TL;DR
StarDrinks is a bilingual test set designed to evaluate speech and language understanding models in realistic drink ordering scenarios, capturing linguistic variability and spontaneous speech phenomena.
Contribution
We introduce StarDrinks, a novel multilingual dataset with annotated speech and transcriptions for SLU and NLU tasks in a complex, real-world drink ordering context.
Findings
Supports speech-to-slots, transcription-to-slots, and speech-to-transcription evaluations.
Captures diverse named entities, customizations, and spontaneous speech phenomena.
Provides a benchmark for model robustness and generalization in task-oriented dialogue.
Abstract
LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Drink ordering, for example, involves diverse named entities, drink types, sizes, customizations, and brand-specific terminology, as well as spontaneous speech phenomena such as hesitations and self-corrections. To address this gap, we introduce StarDrinks, a test set in English and Korean containing speech utterances features, transcriptions, and annotated slots. Our dataset supports speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR evaluation, providing a realistic benchmark for model robustness and generalization in a linguistically rich, real-world task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
