Wake Word Detection with Alignment-Free Lattice-Free MMI

Yiming Wang; Hang Lv; Daniel Povey; Lei Xie; Sanjeev Khudanpur

arXiv:2005.08347·eess.AS·July 30, 2020·5 cites

Wake Word Detection with Alignment-Free Lattice-Free MMI

Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel alignment-free training method for wake word detection systems that improves accuracy and reduces false rejections in real-time applications by leveraging untranscribed data and explicit silence modeling.

Contribution

It presents an alignment-free LF-MMI training approach, incorporates explicit silence modeling, and develops an FST-based online decoder for wake word detection.

Findings

01

50%-90% reduction in false rejection rates

02

Effective training with untranscribed data

03

Validated on multiple real datasets

Abstract

Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word; (ii) we show that the classical keyword/filler model must be supplemented with an explicit non-speech (silence) model for good performance; (iii) we present an FST-based decoder to perform online detection. We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures, and re-validate them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaldi-asr/kaldi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing