Normalized Features for Improving the Generalization of DNN Based Speech Enhancement
Robert Rehr, Timo Gerkmann

TL;DR
This paper introduces normalized features based on SNR estimates to improve the generalization of deep neural network speech enhancement models, leading to better noise suppression in unseen conditions.
Contribution
It proposes a novel feature set combining a priori and a posteriori SNRs to enhance DNN-based speech enhancement and improve generalization to unknown noise types.
Findings
Significant PESQ and STOI improvements in unseen noise conditions.
Enhanced generalization of DNN models with proposed features.
Listening tests confirm better speech quality and intelligibility.
Abstract
Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the noisy observation while the actual filter functions are derived analytically based on statistical assumptions. Even though such approaches generalize well to many different acoustic conditions, the noise suppression capability in transient noises is low. To amend this shortcoming, machine-learning (ML) methods such as deep learning have been employed for speech enhancement. However, due to their data-driven nature, the generalization of ML based approaches to unknown noise types is still discussed. To improve the generalization of ML based algorithms and to enhance the noise suppression of non-ML based methods, we propose a combination of both approaches.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques
