Multi-Speaker Localization Using Convolutional Neural Network Trained   with Noise

Soumitro Chakrabarty; Emanu\"el A. P. Habets

arXiv:1712.04276·cs.SD·December 13, 2017·36 cites

Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise

Soumitro Chakrabarty, Emanu\"el A. P. Habets

PDF

Open Access

TL;DR

This paper introduces a CNN-based approach for multi-speaker localization trained with synthesized noise signals, demonstrating effectiveness in scenarios with two speakers and outperforming traditional methods.

Contribution

The paper presents a novel training method for CNNs in multi-speaker localization using synthesized noise, simplifying data preparation and improving performance.

Findings

01

Effective localization of two speakers demonstrated

02

Outperforms traditional steered response power method

03

Training with synthesized noise is viable

Abstract

The problem of multi-speaker localization is formulated as a multi-class multi-label classification problem, which is solved using a convolutional neural network (CNN) based source localization method. Utilizing the common assumption of disjoint speaker activities, we propose a novel method to train the CNN using synthesized noise signals. The proposed localization method is evaluated for two speakers and compared to a well-known steered response power method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing