Employing Real Training Data for Deep Noise Suppression
Ziyi Xu, Marvin Sach, Jan Pirklbauer, Tim Fingscheidt

TL;DR
This paper introduces PESQ-DNN, a neural network that estimates speech quality scores without needing clean speech references, enabling the use of real-world data for training deep noise suppression models and improving their performance.
Contribution
The paper presents a novel reference-free perceptual loss using PESQ-DNN and an epoch-wise training protocol to effectively incorporate real data into DNS model training.
Findings
Outperforms synthetic-only training methods in PESQ scores.
Significantly surpasses Interspeech 2021 DNS Challenge baseline.
Achieves higher DNSMOS scores on real and synthetic data.
Abstract
Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to clean speech. However, sometimes an additive microphone model is insufficient for real-world applications. Accordingly, ways to use real training data in supervised learning for DNS models promise to reduce a potential training/inference mismatch. Employing real data for DNS training requires either generative approaches or a reference-free loss without access to the corresponding clean speech. In this work, we propose to employ an end-to-end non-intrusive deep neural network (DNN), named PESQ-DNN, to estimate perceptual evaluation of speech quality (PESQ) scores of enhanced real data. It provides a reference-free perceptual loss for employing real data during DNS training, maximizing the PESQ scores. Furthermore, we use an epoch-wise alternating training protocol, updating the DNS model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Hearing Loss and Rehabilitation
