A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences
Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan,, Gautham J. Mysore, Zeyu Jin

TL;DR
This paper introduces a deep neural network-based perceptual audio metric trained on crowdsourced human judgments to accurately reflect human perception of audio differences, especially near the just-noticeable difference threshold.
Contribution
It presents a novel differentiable perceptual audio metric learned from a large dataset of human judgments, improving correlation with human perception over existing metrics.
Findings
The learned metric outperforms baseline methods in correlating with human judgments.
Replacing traditional loss functions with this metric improves audio denoising results.
The metric is effective as a differentiable loss function for audio processing tasks.
Abstract
Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
