What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics
Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero, Laparra, Jesus Malo

TL;DR
This paper explores using image perceptual metrics, applied to spectrograms, to evaluate audio quality, leveraging similarities between auditory and visual neural processing, and customizes metrics for sound signals, showing promising correlation with human judgments.
Contribution
It introduces a novel approach of applying and customizing image quality metrics to assess audio signals via spectrograms, bridging visual and auditory perceptual models.
Findings
Spectrogram-based metrics correlate well with human audio quality ratings.
Customized metrics improve evaluation accuracy for sound signals.
Proposed method shows promise for objective audio quality assessment.
Abstract
In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the similarity between the neural mechanisms in the auditory and visual pathways. Furthermore, we customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals. We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset, with promising results in terms of the correlation between the metrics and the perceived quality of audio as rated by human evaluators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Image and Signal Denoising Methods · Noise Effects and Management
