TL;DR
This paper presents a novel few-shot transfer learning approach for multilingual keyword spotting, enabling accurate detection of new keywords across multiple languages with minimal training data.
Contribution
It introduces a multilingual embedding model trained on open speech corpora and demonstrates effective few-shot fine-tuning for keyword spotting in 13 unseen languages.
Findings
Achieves 0.75 F1 score on 180 new keywords in nine languages.
Attains 0.65 F1 score on 260 keywords across 13 new languages.
Reaches 87.4% streaming keyword spotting accuracy across 22 languages.
Abstract
We introduce a few-shot transfer learning method for keyword spotting in any language. Leveraging open speech corpora in nine languages, we automate the extraction of a large multilingual keyword bank and use it to train an embedding model. With just five training examples, we fine-tune the embedding model for keyword spotting and achieve an average F1 score of 0.75 on keyword classification for 180 new keywords unseen by the embedding model in these nine languages. This embedding model also generalizes to new languages. We achieve an average F1 score of 0.65 on 5-shot models for 260 keywords sampled across 13 new languages unseen by the embedding model. We investigate streaming accuracy for our 5-shot models in two contexts: keyword spotting and keyword search. Across 440 keywords in 22 languages, we achieve an average streaming keyword spotting accuracy of 87.4% with a false…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
