A Streaming End-to-End Framework For Spoken Language Understanding
Nihal Potdar, Anderson R. Avila, Chao Xing, Dong Wang, Yiran Cao, Xiao, Chen

TL;DR
This paper introduces a streaming end-to-end SLU framework using a unidirectional RNN with CTC that can process multiple intentions online, achieving high accuracy comparable to non-streaming models.
Contribution
The paper presents a novel streaming, incremental SLU framework capable of recognizing multiple intentions in real-time, unlike previous models that process one intention at a time.
Findings
Achieved about 97% intent detection accuracy on FSC dataset.
Performed well on keyword spotting with Google Speech Commands dataset.
Comparable to state-of-the-art non-streaming models in accuracy.
Abstract
End-to-end spoken language understanding (SLU) has recently attracted increasing interest. Compared to the conventional tandem-based approach that combines speech recognition and language understanding as separate modules, the new approach extracts users' intentions directly from the speech signals, resulting in joint optimization and low latency. Such an approach, however, is typically designed to process one intention at a time, which leads users to take multiple rounds to fulfill their requirements while interacting with a dialogue system. In this paper, we propose a streaming end-to-end framework that can process multiple intentions in an online and incremental way. The backbone of our framework is a unidirectional RNN trained with the connectionist temporal classification (CTC) criterion. By this design, an intention can be identified when sufficient evidence has been accumulated,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis
