Low-resource Low-footprint Wake-word Detection using Knowledge Distillation
Arindam Ghosh, Mark Fuhs, Deblin Bagchi, Bahman Farahani, Monika, Woszczyna

TL;DR
This paper presents methods to enhance low-resource wake-word detection by leveraging transfer learning and knowledge distillation, achieving improved accuracy and reduced latency on open-source and challenging datasets.
Contribution
It introduces the combined use of transfer learning and knowledge distillation with time-synchronous training to improve wake-word detection in low-resource settings.
Findings
Improved detection accuracy across datasets.
Reduced detection latency.
Effective use of knowledge distillation from large acoustic models.
Abstract
As virtual assistants have become more diverse and specialized, so has the demand for application or brand-specific wake words. However, the wake-word-specific datasets typically used to train wake-word detectors are costly to create. In this paper, we explore two techniques to leverage acoustic modeling data for large-vocabulary speech recognition to improve a purpose-built wake-word detector: transfer learning and knowledge distillation. We also explore how these techniques interact with time-synchronous training targets to improve detection latency. Experiments are presented on the open-source "Hey Snips" dataset and a more challenging in-house far-field dataset. Using phone-synchronous targets and knowledge distillation from a large acoustic model, we are able to improve accuracy across dataset sizes for both datasets while reducing latency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsKnowledge Distillation
