Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel

TL;DR
This paper introduces a hybrid approach combining LSTM neural networks with spatial clustering techniques to improve multi-channel speech enhancement, achieving better generalization and separation performance.
Contribution
It proposes a novel method that integrates LSTM-based signal modeling with spatial clustering, enhancing multi-channel speech enhancement capabilities.
Findings
Improved SDR and PESQ scores over baselines.
Reduced word error rate in speech recognition.
Enhanced generalization to different microphone configurations.
Abstract
Recent works have shown that Deep Recurrent Neural Networks using the LSTM architecture can achieve strong single-channel speech enhancement by estimating time-frequency masks. However, these models do not naturally generalize to multi-channel inputs from varying microphone configurations. In contrast, spatial clustering techniques can achieve such generalization but lack a strong signal model. Our work proposes a combination of the two approaches. By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance and generality of multi-channel spatial clustering. We compare our proposed system to several baselines on the CHiME-3 dataset. We evaluate the quality of the audio from each system using SDR from the BSS\_eval toolkit and PESQ. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
