DONUT: CTC-based Query-by-Example Keyword Spotting

Loren Lugosch; Samuel Myer; Vikrant Singh Tomar

arXiv:1811.10736·cs.LG·November 28, 2018·6 cites

DONUT: CTC-based Query-by-Example Keyword Spotting

Loren Lugosch, Samuel Myer, Vikrant Singh Tomar

PDF

Open Access 1 Repo

TL;DR

DONUT is a low-resource, CTC-based query-by-example keyword spotting system that enables personalized wakeword detection on embedded devices without cloud data upload.

Contribution

It introduces a novel CTC-based algorithm for online query-by-example keyword spotting that combines user adaptation with low computational requirements.

Findings

01

Effective personalized wakeword detection with few training examples

02

Operates efficiently on embedded systems without cloud data

03

Achieves high interpretability and generalization

Abstract

Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trinhtuanvubk/KWS-based-ASR
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing