Signal-informed DNN-based DOA Estimation combining an External Microphone and GCC-PHAT Features
Ulrik Kowalk, Simon Doclo, Joerg Bitzer

TL;DR
This paper introduces a signal-informed deep neural network approach for estimating the direction of arrival of a speaker in multi-talker environments, utilizing an external microphone and GCC-PHAT features to improve accuracy.
Contribution
It presents a novel method that applies a binary mask based on an external microphone's power distribution to enhance DOA estimation without prior knowledge of interfering speakers.
Findings
Improved localization accuracy in reverberant scenarios.
Effective use of external microphone data for DOA estimation.
No need for prior information about interfering speakers.
Abstract
Aiming at estimating the direction of arrival (DOA) of a desired speaker in a multi-talker environment using a microphone array, in this paper we propose a signal-informed method exploiting the availability of an external microphone attached to the desired speaker. The proposed method applies a binary mask to the GCC-PHAT input features of a convolutional neural network, where the binary mask is computed based on the power distribution of the external microphone signal. Experimental results for a reverberant scenario with up to four interfering speakers demonstrate that the signal-informed masking improves the localization accuracy, without requiring any knowledge about the interfering speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Underwater Acoustics Research
