The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann

TL;DR
This paper highlights the risks of overfitting speech enhancement models to specific metrics like PESQ, demonstrating that high metric scores do not necessarily correlate with better perceptual quality, and advocates for multi-metric evaluation.
Contribution
The paper introduces the PESQetarian model that exploits PESQ optimization and demonstrates the potential pitfalls of metric overfitting in speech enhancement.
Findings
High PESQ scores can be achieved without perceptual quality improvements.
Optimizing solely for PESQ may lead to poor listening experience.
Multi-metric evaluation is essential for reliable speech enhancement assessment.
Abstract
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsFocus
