SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen, Livescu, Kyu J. Han

TL;DR
This paper introduces SLUE, a new benchmark suite for evaluating higher-level spoken language understanding tasks on natural speech, including datasets, annotations, and baseline results to facilitate progress in the field.
Contribution
It presents the first phase of SLUE, a benchmark suite with datasets, annotations, and evaluation tools for spoken language understanding tasks on natural speech.
Findings
Baseline models demonstrate current performance levels.
New annotated datasets for NER, sentiment analysis, and ASR.
Open-source toolkit enables reproducibility and further research.
Abstract
Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level spoken language understanding tasks, including using end-to-end models, but there are fewer annotated datasets for such tasks. At the same time, recent work shows the possibility of pre-training generic representations and then fine-tuning for several tasks using relatively little labeled data. We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets. This resource would allow the research community to track progress, evaluate pre-trained representations for higher-level tasks, and study open questions such as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
