Low-resource Low-footprint Wake-word Detection using Knowledge   Distillation

Arindam Ghosh; Mark Fuhs; Deblin Bagchi; Bahman Farahani; Monika; Woszczyna

arXiv:2207.03331·eess.AS·July 8, 2022·1 cites

Low-resource Low-footprint Wake-word Detection using Knowledge Distillation

Arindam Ghosh, Mark Fuhs, Deblin Bagchi, Bahman Farahani, Monika, Woszczyna

PDF

Open Access

TL;DR

This paper presents methods to enhance low-resource wake-word detection by leveraging transfer learning and knowledge distillation, achieving improved accuracy and reduced latency on open-source and challenging datasets.

Contribution

It introduces the combined use of transfer learning and knowledge distillation with time-synchronous training to improve wake-word detection in low-resource settings.

Findings

01

Improved detection accuracy across datasets.

02

Reduced detection latency.

03

Effective use of knowledge distillation from large acoustic models.

Abstract

As virtual assistants have become more diverse and specialized, so has the demand for application or brand-specific wake words. However, the wake-word-specific datasets typically used to train wake-word detectors are costly to create. In this paper, we explore two techniques to leverage acoustic modeling data for large-vocabulary speech recognition to improve a purpose-built wake-word detector: transfer learning and knowledge distillation. We also explore how these techniques interact with time-synchronous training targets to improve detection latency. Experiments are presented on the open-source "Hey Snips" dataset and a more challenging in-house far-field dataset. Using phone-synchronous targets and knowledge distillation from a large acoustic model, we are able to improve accuracy across dataset sizes for both datasets while reducing latency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsKnowledge Distillation