Binaural Target Speaker Extraction using Individualized HRTF
Yoav Ellinson, Sharon Gannot

TL;DR
This paper introduces a speaker-independent binaural target speaker extraction method using individualized HRTF and complex-valued neural networks, effectively isolating the target speech while preserving binaural cues in various acoustic conditions.
Contribution
It presents a novel complex-valued neural network approach leveraging individualized HRTF for binaural target speaker extraction without needing speaker embeddings.
Findings
Achieves high extraction performance in anechoic conditions
Maintains speech clarity and binaural cues in reverberant environments
Performs comparably to state-of-the-art methods in noise reduction
Abstract
In this work, we address the problem of binaural target-speaker extraction in the presence of multiple simultane-ous talkers. We propose a novel approach that leverages the individual listener's Head-Related Transfer Function (HRTF) to isolate the target speaker. The proposed method is speaker-independent, as it does not rely on speaker embeddings. We employ a fully complex-valued neural network that operates directly on the complex-valued Short-Time Fourier transform (STFT) of the mixed audio signals, and compare it to a Real-Imaginary (RI)-based neural network, demonstrating the advantages of the former. We first evaluate the method in an anechoic, noise-free scenario, achieving excellent extraction performance while preserving the binaural cues of the target signal. We then extend the evaluation to reverberant conditions. Our method proves robust, maintaining speech clarity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
