Environmental Noise Embeddings for Robust Speech Recognition
Suyoun Kim, Bhiksha Raj, Ian Lane

TL;DR
This paper introduces a novel noise-aware deep neural network architecture that explicitly models environmental noise to improve speech recognition accuracy in noisy and reverberant conditions.
Contribution
The paper presents a new deep neural network architecture that incorporates environmental noise knowledge via a discriminative embedding for robust speech recognition.
Findings
Significant improvement in recognition accuracy in noisy environments.
Outperforms existing methods like multi-condition training and noise-aware training.
Effective on multiple datasets including CHiME-3 and Aurora4.
Abstract
We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
