Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet
Ziyi Xu, Maximilian Strake, Tim Fingscheidt

TL;DR
This paper introduces a non-intrusive neural network to estimate PESQ scores, enabling direct optimization of speech enhancement models for perceptual quality, leading to improved performance over traditional MSE-based training.
Contribution
The paper presents a novel PESQNet that estimates PESQ scores non-intrusively and mediates training of DNS models, with an alternating training scheme for enhanced perceptual quality.
Findings
PESQNet-mediated training improves PESQ by 0.1 points over MSE-based training.
The method outperforms the Interspeech 2021 DNS Challenge baseline.
Enhanced perceptual quality is achieved through direct PESQ optimization.
Abstract
Speech enhancement employing deep neural networks (DNNs) for denoising are called deep noise suppression (DNS). During training, DNS methods are typically trained with mean squared error (MSE) type loss functions, which do not guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm is non-differentiable, and therefore cannot directly be used as optimization criterion for gradient-based learning. In this work, we propose an end-to-end non-intrusive PESQNet DNN to estimate the PESQ scores of the enhanced speech signal. Thus, by providing a reference-free perceptual loss, it serves as a mediator towards the DNS training, allowing to maximize the PESQ score of the enhanced speech signal. We illustrate the potential of our proposed PESQNet-mediated training on the basis of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Speech Recognition and Synthesis
