Progressive Continual Learning for Spoken Keyword Spotting
Yizheng Huang, Nana Hou, Nancy F. Chen

TL;DR
This paper introduces PCL-KWS, a progressive continual learning framework for spoken keyword spotting that effectively learns new keywords sequentially without forgetting previous ones, maintaining high accuracy with minimal model growth.
Contribution
The paper proposes a novel progressive continual learning strategy with task-specific sub-networks and keyword-aware scaling, enabling incremental learning in KWS without catastrophic forgetting.
Findings
Achieves 92.8% average accuracy on Google Speech Command dataset.
Outperforms existing baselines in continual learning for KWS.
Maintains high performance with constrained model growth.
Abstract
Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment. To tackle such challenges, we propose a progressive continual learning strategy for small-footprint spoken keyword spotting (PCL-KWS). Specifically, the proposed PCL-KWS framework introduces a network instantiator to generate the task-specific sub-networks for remembering previously learned keywords. As a result, the PCL-KWS approach incrementally learns new keywords without forgetting prior knowledge. Besides, the keyword-aware network scaling mechanism of PCL-KWS constrains the growth of model parameters while achieving high performance. Experimental results show that after learning five new tasks sequentially, our proposed PCL-KWS approach archives the new state-of-the-art performance of 92.8% average accuracy for all the tasks on Google Speech Command dataset compared with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
