An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition
Juan Pablo Escudero, Jos\'e Novoa, Rodrigo Mahu, Jorge Wuth, Fernando, Huenup\'an, Richard Stern, N\'estor Becerra Yoma

TL;DR
This paper proposes an improved DNN-based spectral feature mapping method to reduce noise and reverberation effects, significantly enhancing speech recognition accuracy especially when combined with WPE.
Contribution
The paper introduces modifications to DNN training for better noise and reverberation removal, achieving notable WER reductions in noisy reverberant environments.
Findings
DNN achieves 4.5% WER reduction at low SNRs
Combining DNN with WPE yields 11% WER reduction
Modified DNN training results in 18.3% WER reduction overall
Abstract
Reverberation and additive noise have detrimental effects on the performance of automatic speech recognition systems. In this paper we explore the ability of a DNN-based spectral feature mapping to remove the effects of reverberation and additive noise. Experiments with the CHiME-2 database show that this DNN can achieve an average reduction in WER of 4.5%, when compared to the baseline system, at SNRs equal to -6 dB, -3 dB, 0 dB and 3 dB, and just 0.8% at greater SNRs of 6 dB and 9 dB. These results suggest that this DNN is more effective in removing additive noise than reverberation. To improve the DNN performance, we combine it with the weighted prediction error (WPE) method that shows a complementary behavior. While this combination provided a reduction in WER of approximately 11% when compared with the baseline, the observed improvement is not as great as that obtained using WPE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
