Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained   with Noise Signals

Soumitro Chakrabarty; Emanu\"el A. P. Habets

arXiv:1807.11722·eess.AS·May 22, 2019

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals

Soumitro Chakrabarty, Emanu\"el A. P. Habets

PDF

TL;DR

This paper introduces a CNN-based supervised method for multi-speaker DOA estimation that is trained with synthesized noise signals, demonstrating robustness and adaptability in various acoustic environments.

Contribution

The paper proposes a novel CNN training approach using synthesized noise signals for multi-speaker DOA estimation, enhancing robustness and adaptability to different acoustic conditions.

Findings

01

Effective localization in unseen acoustic environments

02

Robust to different noise types

03

Optimal performance with M-1 convolution layers for M microphones

Abstract

Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction-of-arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the short-time Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.