Deep Spoken Keyword Spotting: An Overview
Iv\'an L\'opez-Espejo, Zheng-Hua Tan, John Hansen, Jesper, Jensen

TL;DR
This paper provides a comprehensive review of deep spoken keyword spotting, covering system components, robustness, applications, datasets, and future research directions in this rapidly evolving field.
Contribution
It offers a thorough overview of deep KWS systems, analyzing various aspects and identifying future research directions specific to spoken keyword spotting.
Findings
Deep KWS systems have evolved with various speech features and acoustic models.
Robustness methods improve KWS performance in noisy environments.
Future research includes integrating ASR techniques and exploring audio-visual KWS.
Abstract
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
