Loss functions incorporating auditory spatial perception in deep learning -- a review
Boaz Rafaely, Stefan Weinzierl, Or Berebi, Fabian Brinkmann

TL;DR
This review surveys recent loss functions in binaural audio synthesis that incorporate perceptual spatial cues, emphasizing localization and room response, and discusses future directions for perceptually grounded loss design.
Contribution
The paper provides a comprehensive overview of loss functions incorporating spatial perception cues in binaural audio, highlighting current focus areas and future research opportunities.
Findings
Strong focus on localization cues like ITDs and ILDs.
Room reverberation attributes are less explored in loss functions.
Emerging methods estimate room parameters for improved spatial audio synthesis.
Abstract
Binaural reproduction aims to deliver immersive spatial audio with high perceptual realism over headphones. Loss functions play a central role in optimizing and evaluating algorithms that generate binaural signals. However, traditional signal-related difference measures often fail to capture the perceptual properties that are essential to spatial audio quality. This review paper surveys recent loss functions that incorporate spatial perception cues relevant to binaural reproduction. It focuses on losses applied to binaural signals, which are often derived from microphone recordings or Ambisonics signals, while excluding those based on room impulse responses. Guided by the Spatial Audio Quality Inventory (SAQI), the review emphasizes perceptual dimensions related to source localization and room response, while excluding general spectral-temporal attributes. The literature survey reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
