Iterative Sound Source Localization for Unknown Number of Sources

Yanjie Fu; Meng Ge; Haoran Yin; Xinyuan Qian; Longbiao Wang; Gaoyan; Zhang; Jianwu Dang

arXiv:2206.12273·eess.AS·June 27, 2022

Iterative Sound Source Localization for Unknown Number of Sources

Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan, Zhang, Jianwu Dang

PDF

Open Access 2 Repos

TL;DR

The paper introduces ISSL, an iterative method for sound source localization that accurately detects the number and directions of sources without relying on thresholds, outperforming existing algorithms.

Contribution

The novel ISSL approach uses an active source detector network to iteratively localize sources, handling unknown and varying source counts without threshold dependence.

Findings

01

Significant improvement in DOA estimation accuracy.

02

Enhanced source number detection robustness.

03

Effective handling of more sources than during training.

Abstract

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio. For the practical problem of unknown number of sources, existing localization algorithms attempt to predict a likelihood-based coding (i.e., spatial spectrum) and employ a pre-determined threshold to detect the source number and corresponding DOA value. However, these threshold-based algorithms are not stable since they are limited by the careful choice of threshold. To address this problem, we propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met. Unlike threshold-based algorithms, ISSL designs an active source detector network based on binary classifier to accept residual spatial spectrum and decide whether to stop the iteration. By doing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques