On combining features for single-channel robust speech recognition in reverberant environments
Jos\'e Novoa, Josu\'e Fredes, Jorge Wuth, Fernando Huenup\'an, Richard, M. Stern, Nestor Becerra Yoma

TL;DR
This study explores combining multiple speech recognition systems to improve accuracy in highly reverberant environments, demonstrating significant WER reductions through feature combination and system fusion techniques.
Contribution
It introduces a method for combining features at different system levels to enhance speech recognition robustness in real reverberant settings.
Findings
WER improvements between 7% and 18% in real environments
DNN-output level combination is more effective than system-output level
Cascading both combination schemes yields further WER reduction
Abstract
This paper addresses the combination of complementary parallel speech recognition systems to reduce the error rate of speech recognition systems operating in real highly-reverberant environments. First, the testing environment consists of recordings of speech in a calibrated real room with reverberation times from 0.47 to 1.77 seconds and speaker-to-microphone distances of 0.16 to 2.56 meters. We combined systems both at the level of the DNN outputs and at the level of the final ASR outputs. Second, recognition experiments with the reverb challenge are also reported. The results presented here show that the combination of features can lead to WER improvements between 7% and 18% with speech recorded in real reverberant environments. Also, the combination at DNN-output level is much more effective than at the system-output level. However, cascading both schemes can still lead to smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
