Teaching keyword spotters to spot new keywords with limited examples
Abhijeet Awasthi, Kevin Kilgour, Hassan Rom

TL;DR
This paper introduces KeySEM, a speech embedding model pre-trained on keyword recognition, enabling rapid and effective learning of new keywords from limited examples, suitable for personalized and on-device keyword spotting.
Contribution
We propose KeySEM, a novel pre-trained speech embedding model that improves few-shot keyword learning and generalizes across languages without re-training on previous keywords.
Findings
KeySEM outperforms existing methods with fewer examples.
It generalizes well to multiple languages.
It allows sequential learning of new keywords without re-training.
Abstract
Learning to recognize new keywords with just a few examples is essential for personalizing keyword spotting (KWS) models to a user's choice of keywords. However, modern KWS models are typically trained on large datasets and restricted to a small vocabulary of keywords, limiting their transferability to a broad range of unseen keywords. Towards easily customizable KWS models, we present KeySEM (Keyword Speech EMbedding), a speech embedding model pre-trained on the task of recognizing a large number of keywords. Speech representations offered by KeySEM are highly effective for learning new keywords from a limited number of examples. Comparisons with a diverse range of related work across several datasets show that our method achieves consistently superior performance with fewer training examples. Although KeySEM was pre-trained only on English utterances, the performance gains also extend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
