Learning Multiple Sound Source 2D Localization
Guillaume Le Moing, Phongtharin Vinayavekhin, Tadanobu Inoue, Jayakorn, Vongkulbhisal, Asim Munawar, Ryuki Tachibana, Don Joven Agravante

TL;DR
This paper introduces deep learning algorithms for accurately localizing multiple sound sources in 2D space using microphone arrays, with novel representations and metrics validated on synthetic and real data.
Contribution
It presents new deep learning architectures, localization representations, and evaluation metrics for multi-source sound localization in enclosed environments.
Findings
Improved localization accuracy over baseline methods
Effective on both synthetic and real-world data
New metrics enable better comparison of approaches
Abstract
In this paper, we propose novel deep learning based algorithms for multiple sound source localization. Specifically, we aim to find the 2D Cartesian coordinates of multiple sound sources in an enclosed environment by using multiple microphone arrays. To this end, we use an encoding-decoding architecture and propose two improvements on it to accomplish the task. In addition, we also propose two novel localization representations which increase the accuracy. Lastly, new metrics are developed relying on resolution-based multiple source association which enables us to evaluate and compare different localization approaches. We tested our method on both synthetic and real world data. The results show that our method improves upon the previous baseline approach for this problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
