Latency Control for Keyword Spotting

Christin Jose; Joseph Wang; Grant P. Strimel; Mohammad Omar Khursheed,; Yuriy Mishchenko; Brian Kulis

arXiv:2206.07261·eess.AS·September 30, 2022

Latency Control for Keyword Spotting

Christin Jose, Joseph Wang, Grant P. Strimel, Mohammad Omar Khursheed,, Yuriy Mishchenko, Brian Kulis

PDF

TL;DR

This paper introduces a flexible method to control latency in keyword spotting models, balancing accuracy and speed without needing explicit endpoint knowledge, and demonstrates superior performance over existing techniques.

Contribution

A novel latency control approach for KWS models that is adaptable to any loss function and improves detection performance under latency constraints.

Findings

01

25% reduction in false accepts at fixed latency compared to state-of-the-art

02

Effective latency-accuracy tradeoff achieved with a single hyperparameter

03

Improved false accept rates when combined with max-pooling loss

Abstract

Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function without explicit knowledge of the keyword endpoint. Through a single, tunable hyperparameter, our approach enables one to balance detection latency and accuracy for the targeted application. Empirically, we show that our approach gives superior performance under latency constraints when compared to existing methods. Namely, we make a substantial 25\% relative false accepts improvement for a fixed latency target when compared to the baseline state-of-the-art. We also show that when our approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.