Class-Incremental Learning for Sound Event Localization and Detection
Ruchi Pandey, Manjunath Mulimani, Archontis Politis, Annamaria Mesaros

TL;DR
This paper explores class-incremental learning for sound event localization and detection, proposing a method that learns new sound classes while retaining previous knowledge, validated on a realistic spatial sound dataset.
Contribution
It introduces a novel incremental learning approach using distillation loss for SELD tasks, enabling effective learning of new classes without forgetting old ones.
Findings
Maintains baseline performance across all classes after incremental learning.
Successfully learns new sound classes without significant performance degradation.
Validated on the TAU-NIGENS Spatial Sound Events 2021 dataset.
Abstract
This paper investigates the feasibility of class-incremental learning (CIL) for Sound Event Localization and Detection (SELD) tasks. The method features an incremental learner that can learn new sound classes independently while preserving knowledge of old classes. The continual learning is achieved through a mean square error-based distillation loss to minimize output discrepancies between subsequent learners. The experiments are conducted on the TAU-NIGENS Spatial Sound Events 2021 dataset, which includes 12 different sound classes and demonstrate the efficacy of proposed method. We begin by learning 8 classes and introduce the 4 new classes at next stage. After the incremental phase, the system is evaluated on the full set of learned classes. Results show that, for this realistic dataset, our proposed method successfully maintains baseline performance across all metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training
