Interaural time difference loss for binaural target sound extraction
Carlos Hernandez-Olivan, Marc Delcroix, Tsubasa Ochiai, Naohiro, Tawara, Tomohiro Nakatani, Shoko Araki

TL;DR
This paper introduces a novel interaural time difference (ITD) loss for binaural target sound extraction, improving the preservation of spatial cues like ILD, IPD, and ITD in neural network-based systems.
Contribution
The paper proposes a new ITD loss function for binaural sound extraction, enhancing spatial cue preservation beyond traditional signal-level losses.
Findings
Adding spatial losses improves cue preservation
The ITD loss outperforms other spatial losses in experiments
Signal-level metrics are maintained with the new loss
Abstract
Binaural target sound extraction (TSE) aims to extract a desired sound from a binaural mixture of arbitrary sounds while preserving the spatial cues of the desired sound. Indeed, for many applications, the target sound signal and its spatial cues carry important information about the sound source. Binaural TSE can be realized with a neural network trained to output only the desired sound given a binaural mixture and an embedding characterizing the desired sound class as inputs. Conventional TSE systems are trained using signal-level losses, which measure the difference between the extracted and reference signals for the left and right channels. In this paper, we propose adding explicit spatial losses to better preserve the spatial cues of the target sound. In particular, we explore losses aiming at preserving the interaural level (ILD), phase (IPD), and time differences (ITD). We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Acoustics Research · Speech and Audio Processing · Advanced SAR Imaging Techniques
