Domain-Incremental Continual Learning for Robust and Efficient Keyword Spotting in Resource Constrained Systems
Prakash Dhungana, Sayed Ahmad Salehi

TL;DR
This paper presents a domain-incremental continual learning framework for keyword spotting that enhances robustness and efficiency in resource-constrained edge devices, effectively handling domain shifts due to noise and recording conditions.
Contribution
It introduces a novel continual learning pipeline combining a dual-input CNN, comprehensive denoising, and prototype-based sample selection, updating the entire compact model for improved robustness.
Findings
Achieves 99.63% accuracy on clean data
Maintains over 94% accuracy in noisy environments at -10 dB SNR
Demonstrates robustness across diverse noise conditions
Abstract
Keyword Spotting (KWS) systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges due to domain shifts caused by varying noise and recording conditions. To address this, we propose a comprehensive framework for continual learning designed to adapt to new domains while maintaining computational efficiency. The proposed pipeline integrates a dual-input Convolutional Neural Network, utilizing both Mel Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram features, supported by a multi-stage denoising process, involving discrete wavelet transform and spectral subtraction techniques, plus model and prototype update blocks. Unlike prior methods that restrict updates to specific layers, our approach updates the complete quantized model, made possible due to compact model architecture. A subset of input samples are selected during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
