STOP: A dataset for Spoken Task Oriented Semantic Parsing

Paden Tomasello; Akshat Shrivastava; Daniel Lazar; Po-Chun Hsu; Duc; Le; Adithya Sagar; Ali Elkahky; Jade Copet; Wei-Ning Hsu; Yossi Adi; Robin; Algayres; Tu Ahn Nguyen; Emmanuel Dupoux; Luke Zettlemoyer; Abdelrahman; Mohamed

arXiv:2207.10643·cs.CL·October 19, 2022

STOP: A dataset for Spoken Task Oriented Semantic Parsing

Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc, Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin, Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman, Mohamed

PDF

Open Access 1 Repo

TL;DR

This paper introduces the STOP dataset, a large and complex spoken language understanding dataset with semantic parse labels, to facilitate research in end-to-end SLU models that directly predict intent from audio.

Contribution

It releases the largest publicly available SLU dataset with human and TTS audio, and establishes low-resource benchmarks for end-to-end SLU system development.

Findings

01

End-to-end SLU models perform slightly worse than cascaded models.

02

The dataset includes human-recorded and TTS-generated audio for domain adaptation.

03

Benchmark results highlight challenges and opportunities for low-resource SLU.

Abstract

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/fairseq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems