Streaming Intended Query Detection using E2E Modeling for Continued Conversation
Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath,, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

TL;DR
This paper introduces a streaming end-to-end intended query detector integrated into speech recognition systems to improve detection accuracy and reduce latency in voice-activated devices, enhancing user experience in continuous conversations.
Contribution
It presents a novel E2E model for intended query detection that reduces latency and improves detection accuracy compared to traditional methods.
Findings
22% relative improvement in EER for detection accuracy
600 ms latency reduction over independent detectors
Detects user intent with 8.7% EER within 1.4 seconds
Abstract
In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters out other utterancesnot directed towards device. The proposed approach incor-porates the intended query detector into the E2E model thatalready folds different components of the speech recognitionpipeline into one neural network.The E2E modeling onspeech decoding and intended query detection also allows us todeclare a quick intended query detection based on early partialrecognition result, which is important to decrease latencyand make the system responsive. We demonstrate that theproposed E2E…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
