Fully Unsupervised Training of Few-shot Keyword Spotting

Dongjune Lee; Minchan Kim; Sung Hwan Mun; Min Hyun Han; Nam Soo Kim

arXiv:2210.02732·eess.AS·October 10, 2022·SLT

Fully Unsupervised Training of Few-shot Keyword Spotting

Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

PDF

Open Access

TL;DR

This paper introduces an unsupervised, synthetic-data-based approach for few-shot keyword spotting that leverages metric learning and speech synthesis to eliminate the need for labeled datasets.

Contribution

It presents a fully unsupervised FS-KWS system trained solely on synthetic speech data using metric learning and speech synthesis with pseudo phonemes.

Findings

01

Competitive performance on real datasets without labeled data

02

Effective use of synthetic multi-view samples for training

03

Elimination of the need for large labeled datasets

Abstract

For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples. To alleviate the expensive data collection with labeling, in this paper, we propose a novel FS-KWS system trained only on synthetic data. The proposed system is based on metric learning enabling target keywords to be detected using distance metrics. Exploiting the speech synthesis model that generates speech with pseudo phonemes instead of texts, we easily obtain a large collection of multi-view samples with the same semantics. These samples are sufficient for training, considering metric learning does not intrinsically necessitate labeled data. All of the components in our framework do not require any supervision, making our method unsupervised. Experimental results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing