Learning Deep Direct-Path Relative Transfer Function for Binaural Sound   Source Localization

Bing Yang; Hong Liu; Xiaofei Li

arXiv:2202.07841·cs.SD·February 17, 2022·1 cites

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Bing Yang, Hong Liu, Xiaofei Li

PDF

Open Access

TL;DR

This paper introduces a deep learning approach to robustly estimate the direct-path relative transfer function (DP-RTF) for binaural sound source localization, improving accuracy in noisy and reverberant environments.

Contribution

It proposes a novel neural network architecture that jointly learns inter-channel features and enhances speech spectra, enabling generalization across different binaural arrays without retraining.

Findings

01

Effective in noisy and reverberant conditions

02

Generalizes well to unseen binaural arrays

03

Improves direction of arrival estimation accuracy

Abstract

Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing