DPLM: A Deep Perceptual Spatial-Audio Localization Metric

Pranay Manocha; Anurag Kumar; Buye Xu; Anjali Menon; Israel D. Gebru,; Vamsi K. Ithapu; Paul Calamia

arXiv:2105.14180·eess.AS·December 22, 2021

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru,, Vamsi K. Ithapu, Paul Calamia

PDF

TL;DR

This paper introduces DPLM, a deep learning-based metric for evaluating the perceptual quality of spatial audio localization, which correlates well with subjective ratings without requiring human-labeled training data.

Contribution

The paper presents a novel, general-purpose spatial localization metric using deep network activations, outperforming baseline metrics in correlating with subjective evaluations.

Findings

01

DPLM correlates strongly with subjective ratings.

02

DPLM outperforms baseline metrics across diverse datasets.

03

No human-labeled training data needed for DPLM.

Abstract

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general purpose quality metric to assess spatial localization differences between two binaural recordings. We model localization similarity by utilizing activation-level distances from deep networks trained for direction of arrival (DOA) estimation. Our proposed metric (DPLM) outperforms baseline metrics on correlation with subjective ratings on a diverse set of datasets, even without the benefit of any human-labeled training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.