Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments
Hendrik Barfuss, Christian Huemmer, Andreas Schwarz, Walter Kellermann

TL;DR
This paper introduces a coherence-based postfilter for speech recognition systems that significantly reduces word error rates in noisy, reverberant environments by filtering diffuse interference components.
Contribution
It proposes a novel coherence-based postfilter applied to beamformer outputs, enhancing speech recognition accuracy in adverse real-world environments.
Findings
Significant reduction in word error rates in CHiME-3 environments.
Effective use of DOA-dependent and DOA-independent estimators.
Improved robustness of speech recognition in noisy, reverberant settings.
Abstract
Speech recognition in adverse real-world environments is highly affected by reverberation and nonstationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the recently updated 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech recognition system is extended by a coherence-based postfilter and the postfilter's impact on the word error rates is investigated for the noisy environments provided by CHiME-3. To determine the time- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
