Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier
Zhiming Wang, Xiaolong Li, Jun Zhou

TL;DR
This paper presents a small-footprint keyword spotting system using DNN and CTC that leverages continuous speech data, achieving competitive accuracy without increasing computational complexity on mobile devices.
Contribution
It introduces a novel CTC-based keyword spotting approach that utilizes general speech corpus, reducing the need for keyword-specific training data.
Findings
Competitive performance with existing DNN-based KWS
No additional computational complexity introduced
Effective use of general speech corpus for keyword spotting
Abstract
Mainly for the sake of solving the lack of keyword-specific data, we propose one Keyword Spotting (KWS) system using Deep Neural Network (DNN) and Connectionist Temporal Classifier (CTC) on power-constrained small-footprint mobile devices, taking full advantage of general corpus from continuous speech recognition which is of great amount. DNN is to directly predict the posterior of phoneme units of any personally customized key-phrase, and CTC to produce a confidence score of the given phoneme sequence as responsive decision-making mechanism. The CTC-KWS has competitive performance in comparison with purely DNN based keyword specific KWS, but not increasing any computational complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
