Streaming Intended Query Detection using E2E Modeling for Continued   Conversation

Shuo-yiin Chang; Guru Prakash; Zelin Wu; Qiao Liang; Tara N. Sainath,; Bo Li; Adam Stambler; Shyam Upadhyay; Manaal Faruqui; Trevor Strohman

arXiv:2208.13322·cs.CL·August 30, 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath,, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

PDF

Open Access

TL;DR

This paper introduces a streaming end-to-end intended query detector integrated into speech recognition systems to improve detection accuracy and reduce latency in voice-activated devices, enhancing user experience in continuous conversations.

Contribution

It presents a novel E2E model for intended query detection that reduces latency and improves detection accuracy compared to traditional methods.

Findings

01

22% relative improvement in EER for detection accuracy

02

600 ms latency reduction over independent detectors

03

Detects user intent with 8.7% EER within 1.4 seconds

Abstract

In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters out other utterancesnot directed towards device. The proposed approach incor-porates the intended query detector into the E2E model thatalready folds different components of the speech recognitionpipeline into one neural network.The E2E modeling onspeech decoding and intended query detection also allows us todeclare a quick intended query detection based on early partialrecognition result, which is important to decrease latencyand make the system responsive. We demonstrate that theproposed E2E…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems