STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc, Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin, Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman, Mohamed

TL;DR
This paper introduces the STOP dataset, a large and complex spoken language understanding dataset with semantic parse labels, to facilitate research in end-to-end SLU models that directly predict intent from audio.
Contribution
It releases the largest publicly available SLU dataset with human and TTS audio, and establishes low-resource benchmarks for end-to-end SLU system development.
Findings
End-to-end SLU models perform slightly worse than cascaded models.
The dataset includes human-recorded and TTS-generated audio for domain adaptation.
Benchmark results highlight challenges and opportunities for low-resource SLU.
Abstract
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
