Predicting detection filters for small footprint open-vocabulary keyword   spotting

Theodore Bluche; Thibault Gisselbrecht

arXiv:1912.07575·cs.CL·September 30, 2020

Predicting detection filters for small footprint open-vocabulary keyword spotting

Theodore Bluche, Thibault Gisselbrecht

PDF

TL;DR

This paper introduces a neural network-based approach for open-vocabulary keyword spotting that is lightweight, customizable, and does not require task-specific data, outperforming traditional methods in various detection tasks.

Contribution

A novel neural network model predicts detection filters for any keyword, enabling customizable voice interfaces without task-specific training data.

Findings

01

Model outperforms acoustic keyword spotting baselines

02

Supports fine-tuning for specific keywords

03

Maintains performance on new keywords after fine-tuning

Abstract

In this paper, we propose a fully-neural approach to open-vocabulary keyword spotting, that allows the users to include a customizable voice interface to their device and that does not require task-specific data. We present a keyword detection neural network weighing less than 250KB, in which the topmost layer performing keyword detection is predicted by an auxiliary network, that may be run offline to generate a detector for any keyword. We show that the proposed model outperforms acoustic keyword spotting baselines by a large margin on two tasks of detecting keywords in utterances and three tasks of detecting isolated speech commands. We also propose a method to fine-tune the model when specific training data is available for some keywords, which yields a performance similar to a standard speech command neural network while keeping the ability of the model to be applied to new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.