Evaluating Objective Speech Quality Metrics for Neural Audio Codecs

Luca A. Lanzend\"orfer; Florian Gr\"otschla

arXiv:2511.19734·cs.SD·November 26, 2025

Evaluating Objective Speech Quality Metrics for Neural Audio Codecs

Luca A. Lanzend\"orfer, Florian Gr\"otschla

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of existing objective speech quality metrics in assessing neural audio codecs, comparing them to human listening tests to identify which metrics reliably reflect perceived audio quality.

Contribution

It provides an empirical analysis of objective metrics' correlation with human perception for neural audio codecs, offering guidance for future evaluations.

Findings

01

Some metrics correlate well with human perception

02

Certain metrics fail to capture relevant distortions

03

Guidance for selecting evaluation metrics in neural audio codecs

Abstract

Neural audio codecs have gained recent popularity for their use in generative modeling as they offer high-fidelity audio reconstruction at low bitrates. While human listening studies remain the gold standard for assessing perceptual quality, they are time-consuming and impractical. In this work, we examine the reliability of existing objective quality metrics in assessing the performance of recent neural audio codecs. To this end, we conduct a MUSHRA listening test on high-fidelity speech signals and analyze the correlation between subjective scores and widely used objective metrics. Our results show that, while some metrics align well with human perception, others struggle to capture relevant distortions. Our findings provide practical guidance for selecting appropriate evaluation metrics when using neural audio codecs for speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis