Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan, Black, Shinji Watanabe

TL;DR
This paper introduces a two-pass low latency end-to-end spoken language understanding system that first makes quick acoustic-based predictions and then refines them with semantic information, improving accuracy and reducing latency.
Contribution
The paper proposes a novel 2-pass SLU framework that combines acoustic and semantic information for low latency, high accuracy spoken language understanding.
Findings
Outperforms acoustic-only SLU models on benchmark datasets
Reduces inference latency compared to single-pass models
Enhances understanding of semantic content in spoken language
Abstract
End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches. However, recent work has shown that these models struggle to generalize to new phrasings for the same intent indicating that models cannot understand the semantic content of the given utterance. In this work, we incorporated language models pre-trained on unlabeled text data inside E2E-SLU frameworks to build strong semantic representations. Incorporating both semantic and acoustic information can increase the inference time, leading to high latency when deployed for applications like voice assistants. We developed a 2-pass SLU system that makes low latency prediction using acoustic information from the few seconds of the audio in the first pass and makes higher quality prediction in the second pass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and dialogue systems
