Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

Zexia Fan; Yu Chen; Qiquan Zhang; Kainan Chen; Xinyuan Qian

arXiv:2601.18335·cs.SD·January 27, 2026

Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

Zexia Fan, Yu Chen, Qiquan Zhang, Kainan Chen, Xinyuan Qian

PDF

Open Access

TL;DR

This paper introduces a unified framework for sound source localization that addresses intra-task and inter-task imbalances, improving accuracy and robustness in real-world scenarios without needing exemplar storage.

Contribution

It proposes a GCC-PHAT-based data augmentation and an analytic dynamic imbalance rectifier to mitigate distribution skews and catastrophic forgetting in SSL models.

Findings

01

Achieves 89.0% accuracy on SSLR benchmark

02

Reduces mean absolute error to 5.3 degrees

03

Demonstrates 1.6 backward transfer indicating effective continual learning

Abstract

Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing