DNN Based Speech Enhancement for Unseen Noises Using Monte Carlo Dropout
Nazreen P M, A G Ramakrishnan

TL;DR
This paper introduces a Bayesian approach using Monte Carlo dropout to improve the generalization of DNNs for speech enhancement, especially on unseen noise conditions, by estimating model uncertainty and dynamically selecting the best model.
Contribution
It applies MC dropout as a Bayesian estimator to enhance DNN generalization for unseen noises and proposes a dynamic model selection method based on estimated model precision.
Findings
MC dropout improves speech enhancement on unseen noises.
Dynamic model selection based on estimated precision enhances performance.
The approach outperforms traditional DNN methods in generalization.
Abstract
In this work, we propose the use of dropouts as a Bayesian estimator for increasing the generalizability of a deep neural network (DNN) for speech enhancement. By using Monte Carlo (MC) dropout, we show that the DNN performs better enhancement in unseen noise and SNR conditions. The DNN is trained on speech corrupted with Factory2, M109, Babble, Leopard and Volvo noises at SNRs of 0, 5 and 10 dB and tested on speech with white, pink and factory1 noises. Speech samples are obtained from the TIMIT database and noises from NOISEX-92. In another experiment, we train five DNN models separately on speech corrupted with Factory2, M109, Babble, Leopard and Volvo noises, at 0, 5 and 10 dB SNRs. The model precision (estimated using MC dropout) is used as a proxy for squared error to dynamically select the best of the DNN models based on their performance on each frame of test data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
