Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Bing Yang, Hong Liu, Xiaofei Li

TL;DR
This paper introduces a deep learning approach to robustly estimate the direct-path relative transfer function (DP-RTF) for binaural sound source localization, improving accuracy in noisy and reverberant environments.
Contribution
It proposes a novel neural network architecture that jointly learns inter-channel features and enhances speech spectra, enabling generalization across different binaural arrays without retraining.
Findings
Effective in noisy and reverberant conditions
Generalizes well to unseen binaural arrays
Improves direction of arrival estimation accuracy
Abstract
Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing
