Environmental Noise Embeddings for Robust Speech Recognition

Suyoun Kim; Bhiksha Raj; Ian Lane

arXiv:1601.02553·cs.CL·October 3, 2016·22 cites

Environmental Noise Embeddings for Robust Speech Recognition

Suyoun Kim, Bhiksha Raj, Ian Lane

PDF

Open Access

TL;DR

This paper introduces a novel noise-aware deep neural network architecture that explicitly models environmental noise to improve speech recognition accuracy in noisy and reverberant conditions.

Contribution

The paper presents a new deep neural network architecture that incorporates environmental noise knowledge via a discriminative embedding for robust speech recognition.

Findings

01

Significant improvement in recognition accuracy in noisy environments.

02

Outperforms existing methods like multi-condition training and noise-aware training.

03

Effective on multiple datasets including CHiME-3 and Aurora4.

Abstract

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing