Audio Similarity is Unreliable as a Proxy for Audio Quality
Pranay Manocha, Zeyu Jin, Adam Finkelstein

TL;DR
This paper demonstrates that audio similarity metrics often fail to accurately reflect human perception of audio quality, highlighting their unreliability as proxies for true audio quality assessment.
Contribution
The study identifies specific scenarios where similarity metrics diverge from human judgments and shows no-reference metrics can better correlate with perceived quality.
Findings
Similarity metrics vary with clean references
Metrics are sensitive to imperceptible differences
No-reference metrics correlate better with human perception
Abstract
Many audio processing tasks require perceptual assessment. However, the time and expense of obtaining ``gold standard'' human judgments limit the availability of such data. Most applications incorporate full reference or other similarity-based metrics (e.g. PESQ) that depend on a clean reference. Researchers have relied on such metrics to evaluate and compare various proposed methods, often concluding that small, measured differences imply one is more effective than another. This paper demonstrates several practical scenarios where similarity metrics fail to agree with human perception, because they: (1) vary with clean references; (2) rely on attributes that humans factor out when considering quality, and (3) are sensitive to imperceptible signal level differences. In those scenarios, we show that no-reference metrics do not suffer from such shortcomings and correlate better with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Music and Audio Processing · Structural Health Monitoring Techniques
