Continual Learning for Acoustic Event Classification
Yang Xiao

TL;DR
This paper introduces two novel diversity-aware incremental learning methods for acoustic event classification, enabling on-device models to learn new classes continuously without catastrophic forgetting, while maintaining computational efficiency.
Contribution
The paper proposes two new diversity-aware incremental learning techniques tailored for spoken keyword spotting and environmental sound classification, improving accuracy and memory efficiency.
Findings
Achieved 4.2% accuracy improvement on Google Speech Command dataset.
Outperformed baseline continual learning methods on DCASE 2019 and ESC-50 datasets.
Reduced computational cost compared to traditional perturbation methods.
Abstract
Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device acoustic event classification given the restrictions on computation resources (e.g., model size, running memory). To alleviate such an issue, we propose two novel diversity-aware incremental learning method for Spoken Keyword Spotting and Environmental Sound Classification. Our method selects the historical data for the training by measuring the per-sample classification uncertainty. For the Spoken Keyword Spotting application, the proposed RK approach introduces a diversity-aware sampler to select a diverse set from historical and incoming keywords by calculating classification uncertainty. As a result, the RK approach can incrementally learn new tasks without forgetting prior knowledge. Besides, the RK approach also proposes data augmentation and knowledge distillation loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
