SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization
Bing Yang, Hong Liu, Xiaofei Li

TL;DR
This paper introduces SRP-DNN, a deep learning approach that learns direct-path phase differences for localizing multiple moving sound sources, improving accuracy in noisy and reverberant environments.
Contribution
The paper proposes a novel neural network architecture to directly learn and utilize direct-path phase differences for multi-source localization, addressing assignment ambiguity and source interaction issues.
Findings
Outperforms existing methods in noisy environments
Effective in real-world reverberant scenarios
Accurately localizes multiple moving sources
Abstract
Multiple moving sound source localization in real-world scenarios remains a challenging issue due to interaction between sources, time-varying trajectories, distorted spatial cues, etc. In this work, we propose to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources. A causal convolutional recurrent neural network is designed to extract the direct-path phase difference sequence from signals of each microphone pair. To avoid the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets, the learning target is designed in a weighted sum format, which encodes source activity in the weight and direct-path phase differences in the summed value. The learned direct-path phase differences for all microphone pairs can be directly used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
