On combining features for single-channel robust speech recognition in   reverberant environments

Jos\'e Novoa; Josu\'e Fredes; Jorge Wuth; Fernando Huenup\'an; Richard; M. Stern; Nestor Becerra Yoma

arXiv:1906.07299·eess.AS·June 19, 2019

On combining features for single-channel robust speech recognition in reverberant environments

Jos\'e Novoa, Josu\'e Fredes, Jorge Wuth, Fernando Huenup\'an, Richard, M. Stern, Nestor Becerra Yoma

PDF

Open Access

TL;DR

This study explores combining multiple speech recognition systems to improve accuracy in highly reverberant environments, demonstrating significant WER reductions through feature combination and system fusion techniques.

Contribution

It introduces a method for combining features at different system levels to enhance speech recognition robustness in real reverberant settings.

Findings

01

WER improvements between 7% and 18% in real environments

02

DNN-output level combination is more effective than system-output level

03

Cascading both combination schemes yields further WER reduction

Abstract

This paper addresses the combination of complementary parallel speech recognition systems to reduce the error rate of speech recognition systems operating in real highly-reverberant environments. First, the testing environment consists of recordings of speech in a calibrated real room with reverberation times from 0.47 to 1.77 seconds and speaker-to-microphone distances of 0.16 to 2.56 meters. We combined systems both at the level of the DNN outputs and at the level of the final ASR outputs. Second, recognition experiments with the reverb challenge are also reported. The results presented here show that the combination of features can lead to WER improvements between 7% and 18% with speech recorded in real reverberant environments. Also, the combination at DNN-output level is much more effective than at the system-output level. However, cascading both schemes can still lead to smaller…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing