Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise
Soumitro Chakrabarty, Emanu\"el A. P. Habets

TL;DR
This paper introduces a CNN-based approach for multi-speaker localization trained with synthesized noise signals, demonstrating effectiveness in scenarios with two speakers and outperforming traditional methods.
Contribution
The paper presents a novel training method for CNNs in multi-speaker localization using synthesized noise, simplifying data preparation and improving performance.
Findings
Effective localization of two speakers demonstrated
Outperforms traditional steered response power method
Training with synthesized noise is viable
Abstract
The problem of multi-speaker localization is formulated as a multi-class multi-label classification problem, which is solved using a convolutional neural network (CNN) based source localization method. Utilizing the common assumption of disjoint speaker activities, we propose a novel method to train the CNN using synthesized noise signals. The proposed localization method is evaluated for two speakers and compared to a well-known steered response power method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
