Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, Michael I. Mandel

TL;DR
This paper combines spatial clustering and LSTM speech models to improve multi-channel noise reduction, enhancing speech quality and recognition accuracy in noisy environments.
Contribution
It introduces a novel integration of LSTM speech models with spatial clustering masks, leveraging their combined strengths for better noise suppression.
Findings
Increased speech quality measured by PESQ
Reduced word error rate on CHiME-3 dataset
Outperforms default BeamformIt beamformer
Abstract
Spatial clustering techniques can achieve significant multi-channel noise reduction across relatively arbitrary microphone configurations, but have difficulty incorporating a detailed speech/noise model. In contrast, LSTM neural networks have successfully been trained to recognize speech from noise on single-channel inputs, but have difficulty taking full advantage of the information in multi-channel recordings. This paper integrates these two approaches, training LSTM speech models to clean the masks generated by the Model-based EM Source Separation and Localization (MESSL) spatial clustering method. By doing so, it attains both the spatial separation performance and generality of multi-channel spatial clustering and the signal modeling performance of multiple parallel single-channel LSTM speech enhancers. Our experiments show that when our system is applied to the CHiME-3 dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
