DPLM: A Deep Perceptual Spatial-Audio Localization Metric
Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru,, Vamsi K. Ithapu, Paul Calamia

TL;DR
This paper introduces DPLM, a deep learning-based metric for evaluating the perceptual quality of spatial audio localization, which correlates well with subjective ratings without requiring human-labeled training data.
Contribution
The paper presents a novel, general-purpose spatial localization metric using deep network activations, outperforming baseline metrics in correlating with subjective evaluations.
Findings
DPLM correlates strongly with subjective ratings.
DPLM outperforms baseline metrics across diverse datasets.
No human-labeled training data needed for DPLM.
Abstract
Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general purpose quality metric to assess spatial localization differences between two binaural recordings. We model localization similarity by utilizing activation-level distances from deep networks trained for direction of arrival (DOA) estimation. Our proposed metric (DPLM) outperforms baseline metrics on correlation with subjective ratings on a diverse set of datasets, even without the benefit of any human-labeled training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
