Scaling few-shot spoken word classification with generative meta-continual learning
Louise Beyers, Batsirayi Mupamhi Ziki, Ruan van der Merwe

TL;DR
This paper demonstrates that the GeMCL algorithm enables a spoken word classifier to scale to 1000 classes with only five examples each, achieving rapid adaptation and stable performance compared to baselines.
Contribution
It introduces the application of Generative Meta-Continual Learning to large-scale few-shot spoken word classification, showing significant speed and data efficiency improvements.
Findings
GeMCL achieves stable performance on 1000 classes with few-shot learning.
Compared to baselines, GeMCL adapts 2000 times faster.
GeMCL requires less than half the training data and time.
Abstract
Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the potential of a spoken word classifier to sequentially learn to distinguish between 1000 classes when it is given only five shots per class. We demonstrate that this scaling capability exists by training a model using the Generative Meta-Continual Learning (GeMCL) algorithm and comparing it to repeatedly trained or finetuned baselines. We find that GeMCL produces exceptionally stable performance, and although it does not always outperform a repeatedly fully-finetuned HuBERT model nor a frozen HuBERT model with a repeatedly trained classifier head, it produces comparable performance to the latter while adapting 2000 times faster, having been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
