Direction of Arrival Estimation of Noisy Speech Using Convolutional   Recurrent Neural Networks with Higher-Order Ambisonics Signals

Nils Poschadel; Robert Hupke; Stephan Preihs; J\"urgen Peissig

arXiv:2102.09853·eess.AS·May 7, 2021·EUSIPCO

Direction of Arrival Estimation of Noisy Speech Using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

Nils Poschadel, Robert Hupke, Stephan Preihs, J\"urgen Peissig

PDF

Open Access

TL;DR

This study explores whether higher-order Ambisonics signals improve neural network-based speech direction estimation, finding benefits in simulated data but limited gains in real-world scenarios, emphasizing feature extraction importance.

Contribution

It investigates the impact of using higher Ambisonics orders in neural networks for DOA estimation, highlighting the significance of feature extraction over mere order increase.

Findings

01

Higher Ambisonics orders improve simulated data localization.

02

No significant improvement in real data beyond second order.

03

Intensity vector features outperform magnitude and phase features.

Abstract

Training convolutional recurrent neural networks on first-order Ambisonics signals is a well-known approach when estimating the direction of arrival for speech/sound signals. In this work, we investigate whether increasing the order of Ambisonics up to the fourth order further improves the estimation performance of convolutional recurrent neural networks. While our results on data based on simulated spatial room impulse responses show that the use of higher Ambisonics orders does have the potential to provide better localization results, no further improvement was shown on data based on real spatial room impulse responses from order two onwards. Rather, it seems to be crucial to extract meaningful features from the raw data. First order features derived from the acoustic intensity vector were superior to pure higher-order magnitude and phase features in almost all scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques