Exploring the robustness of features and enhancement on speech   recognition systems in highly-reverberant real environments

Jos\'e Novoa; Juan Pablo Escudero; Jorge Wuth; Victor Poblete; Simon; King; Richard Stern; N\'estor Becerra Yoma

arXiv:1803.09013·eess.AS·March 28, 2018

Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments

Jos\'e Novoa, Juan Pablo Escudero, Jorge Wuth, Victor Poblete, Simon, King, Richard Stern, N\'estor Becerra Yoma

PDF

Open Access

TL;DR

This study assesses the robustness of speech recognition systems in highly reverberant environments, comparing feature extraction and enhancement techniques under different training conditions to improve performance.

Contribution

It evaluates the effectiveness of various feature and enhancement combinations for DNN-HMM speech recognition in reverberant settings, highlighting the impact of training conditions.

Findings

01

WPE and LNFB reduce WERs more with reverberant training.

02

LNFB outperforms MelFB under clean training conditions.

03

Different methods complement each other depending on environment and training.

Abstract

This paper evaluates the robustness of a DNN-HMM-based speech recognition system in highly-reverberant real environments using the HRRE database. The performance of locally-normalized filter bank (LNFB) and Mel filter bank (MelFB) features in combination with Non-negative Matrix Factorization (NMF), Suppression of Slowly-varying components and the Falling edge (SSF) and Weighted Prediction Error (WPE) enhancement methods are discussed and evaluated. Two training conditions were considered: clean and reverberated (Reverb). With Reverb training the use of WPE and LNFB provides WERs that are 3% and 20% lower in average than SSF and NMF, respectively. WPE and MelFB provides WERs that are 11% and 24% lower in average than SSF and NMF, respectively. With clean training, which represents a significant mismatch between testing and training conditions, LNFB features clearly outperform MelFB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing