Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial   Clustering Masks

Zhaoheng Ni; Felix Grezes; Viet Anh Trinh; Michael I. Mandel

arXiv:2012.02191·cs.SD·December 7, 2020·1 cites

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, Michael I. Mandel

PDF

Open Access

TL;DR

This paper combines spatial clustering and LSTM speech models to improve multi-channel noise reduction, enhancing speech quality and recognition accuracy in noisy environments.

Contribution

It introduces a novel integration of LSTM speech models with spatial clustering masks, leveraging their combined strengths for better noise suppression.

Findings

01

Increased speech quality measured by PESQ

02

Reduced word error rate on CHiME-3 dataset

03

Outperforms default BeamformIt beamformer

Abstract

Spatial clustering techniques can achieve significant multi-channel noise reduction across relatively arbitrary microphone configurations, but have difficulty incorporating a detailed speech/noise model. In contrast, LSTM neural networks have successfully been trained to recognize speech from noise on single-channel inputs, but have difficulty taking full advantage of the information in multi-channel recordings. This paper integrates these two approaches, training LSTM speech models to clean the masks generated by the Model-based EM Source Separation and Localization (MESSL) spatial clustering method. By doing so, it attains both the spatial separation performance and generality of multi-channel spatial clustering and the signal modeling performance of multiple parallel single-channel LSTM speech enhancers. Our experiments show that when our system is applied to the CHiME-3 dataset of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory