Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals
Soumitro Chakrabarty, Emanu\"el A. P. Habets

TL;DR
This paper introduces a CNN-based supervised method for multi-speaker DOA estimation that is trained with synthesized noise signals, demonstrating robustness and adaptability in various acoustic environments.
Contribution
The paper proposes a novel CNN training approach using synthesized noise signals for multi-speaker DOA estimation, enhancing robustness and adaptability to different acoustic conditions.
Findings
Effective localization in unseen acoustic environments
Robust to different noise types
Optimal performance with M-1 convolution layers for M microphones
Abstract
Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction-of-arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the short-time Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
