Few-Shot Keyword Spotting from Mixed Speech

Junming Yuan; Ying Shi; LanTian Li; Dong Wang; Askar Hamdulla

arXiv:2407.06078·cs.SD·July 9, 2024

Few-Shot Keyword Spotting from Mixed Speech

Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla

PDF

Open Access

TL;DR

This paper explores combining Mix-Training and large-scale SSL pre-training to improve few-shot keyword spotting in mixed speech scenarios, demonstrating significant effectiveness on LibriSpeech and Google Speech Command datasets.

Contribution

It introduces the use of Mix-Training in the few-shot setting for mixed speech keyword spotting, enhanced by SSL pre-training methods like HuBert.

Findings

01

Mix-Training significantly improves few-shot mixed speech KWS

02

SSL pre-training with HuBert enhances detection accuracy

03

Combined approach achieves strong results across datasets

Abstract

Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has proposed a Mix-Training (MT) approach to solve the problem, however, it has never been tested in the few-shot scenario. In this paper, we investigate the possibility of using MT and other relevant methods to solve the two practical challenges together: few-shot and mixed speech. Experiments conducted on the LibriSpeech and Google Speech Command corpora demonstrate that MT is highly effective on this task when employed in either the pre-training phase or the fine-tuning phase. Moreover, combining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques