Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network
Sharath Adavanne, Archontis Politis, Tuomas Virtanen

TL;DR
This paper introduces a deep neural network, DOAnet, that estimates the directions of multiple sound sources directly from spectrogram magnitudes and phases, performing well across various reverberant environments.
Contribution
The paper presents a novel stacked convolutional and recurrent neural network that estimates multiple sound source DOAs without explicit feature extraction, improving accuracy and robustness.
Findings
Accurately estimates multiple sound source DOAs in various conditions.
Generates high signal-to-noise ratio spatial pseudo-spectra.
Capable of estimating the number of sources and their directions.
Abstract
This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources. The proposed stacked convolutional and recurrent neural network (DOAnet) generates a spatial pseudo-spectrum (SPS) along with the DOA estimates in both azimuth and elevation. We avoid any explicit feature extraction step by using the magnitudes and phases of the spectrograms of all the channels as input to the network. The proposed DOAnet is evaluated by estimating the DOAs of multiple concurrently present sources in anechoic, matched and unmatched reverberant conditions. The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
