An improved DNN-based spectral feature mapping that removes noise and   reverberation for robust automatic speech recognition

Juan Pablo Escudero; Jos\'e Novoa; Rodrigo Mahu; Jorge Wuth; Fernando; Huenup\'an; Richard Stern; N\'estor Becerra Yoma

arXiv:1803.09016·eess.AS·April 5, 2018

An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition

Juan Pablo Escudero, Jos\'e Novoa, Rodrigo Mahu, Jorge Wuth, Fernando, Huenup\'an, Richard Stern, N\'estor Becerra Yoma

PDF

Open Access 1 Repo

TL;DR

This paper proposes an improved DNN-based spectral feature mapping method to reduce noise and reverberation effects, significantly enhancing speech recognition accuracy especially when combined with WPE.

Contribution

The paper introduces modifications to DNN training for better noise and reverberation removal, achieving notable WER reductions in noisy reverberant environments.

Findings

01

DNN achieves 4.5% WER reduction at low SNRs

02

Combining DNN with WPE yields 11% WER reduction

03

Modified DNN training results in 18.3% WER reduction overall

Abstract

Reverberation and additive noise have detrimental effects on the performance of automatic speech recognition systems. In this paper we explore the ability of a DNN-based spectral feature mapping to remove the effects of reverberation and additive noise. Experiments with the CHiME-2 database show that this DNN can achieve an average reduction in WER of 4.5%, when compared to the baseline system, at SNRs equal to -6 dB, -3 dB, 0 dB and 3 dB, and just 0.8% at greater SNRs of 6 dB and 9 dB. These results suggest that this DNN is more effective in removing additive noise than reverberation. To improve the DNN performance, we combine it with the weighted prediction error (WPE) method that shows a complementary behavior. While this combination provided a reduction in WER of approximately 11% when compared with the baseline, the observed improvement is not as great as that obtained using WPE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisjj12/Multi-Speaker-Identification
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing