Incremental Learning of Acoustic Scenes and Sound Events
Manjunath Mulimani, Annamaria Mesaros

TL;DR
This paper introduces an incremental learning approach for acoustic scene classification and audio tagging using a CNN, addressing catastrophic forgetting with independent learning and knowledge distillation, and demonstrating effective performance on TUT datasets.
Contribution
It presents a novel incremental learning method combining independent learning and knowledge distillation to mitigate forgetting in acoustic scene and sound event tasks.
Findings
Achieved 94.0% accuracy on ASC task
Obtained 54.4% F1 score on AT task
Minimal performance decrease on previous tasks
Abstract
In this paper, we propose a method for incremental learning of two distinct tasks over time: acoustic scene classification (ASC) and audio tagging (AT). We use a simple convolutional neural network (CNN) model as an incremental learner to solve the tasks. Generally, incremental learning methods catastrophically forget the previous task when sequentially trained on a new task. To alleviate this problem, we propose independent learning and knowledge distillation (KD) between the timesteps in learning. Experiments are performed on TUT 2016/2017 dataset, containing 4 acoustic scene classes and 25 sound event classes. The proposed incremental learner first solves the ASC task with an accuracy of 94.0%. Next, it learns to solve the AT task with an F1 score of 54.4%. At the same time, its performance on the previous ASC task decreases only by 5.1 percentage points due to the additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
MethodsKnowledge Distillation
