Deep Long Short-Term Memory Adaptive Beamforming Networks For   Multichannel Robust Speech Recognition

Zhong Meng; Shinji Watanabe; John R. Hershey; Hakan Erdogan

arXiv:1711.08016·eess.AS·October 17, 2018

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan

PDF

TL;DR

This paper introduces an LSTM-based adaptive beamforming network that dynamically estimates beamforming filters in real-time, significantly improving far-field speech recognition accuracy in noisy, reverberant environments.

Contribution

It presents a novel joint training of LSTM-based beamformer and acoustic model, leveraging hidden units to enhance filter estimation for robust speech recognition.

Findings

01

Achieved 7.97% absolute gain over baseline systems.

02

Effectively handles non-stationary noise and dynamic environments.

03

Demonstrated improved recognition on CHiME-3 dataset.

Abstract

Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory