Few-Shot Keyword Spotting With Prototypical Networks

Archit Parnami; Minwoo Lee

arXiv:2007.14463·eess.AS·June 14, 2022

Few-Shot Keyword Spotting With Prototypical Networks

Archit Parnami, Minwoo Lee

PDF

1 Repo

TL;DR

This paper introduces a few-shot keyword spotting method using prototypical networks and metric learning, enabling recognition of new user-defined keywords with minimal samples, supported by a new dataset and experimental validation.

Contribution

It formulates few-shot keyword spotting as a metric learning problem and proposes a novel approach using temporal and dilated convolutions on prototypical networks.

Findings

01

Effective recognition of new keywords with few samples

02

Proposed method outperforms baseline approaches

03

Published a new Few-shot Google Speech Commands dataset

Abstract

Recognizing a particular command or a keyword, keyword spotting has been widely used in many voice interfaces such as Amazon's Alexa and Google Home. In order to recognize a set of keywords, most of the recent deep learning based approaches use a neural network trained with a large number of samples to identify certain pre-defined keywords. This restricts the system from recognizing new, user-defined keywords. Therefore, we first formulate this problem as a few-shot keyword spotting and approach it using metric learning. To enable this research, we also synthesize and publish a Few-shot Google Speech Commands dataset. We then propose a solution to the few-shot keyword spotting problem using temporal and dilated convolutions on prototypical networks. Our comparative experimental results demonstrate keyword spotting of new keywords using just a small number of samples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ArchitParnami/Few-Shot-KWS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.