Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
Zhong-Qiu Wang, Peidong Wang, DeLiang Wang

TL;DR
This paper introduces a multi-microphone complex spectral mapping technique using deep learning for effective speaker separation and dereverberation in reverberant environments, applicable to both offline and online scenarios.
Contribution
It presents a novel deep learning approach that predicts real and imaginary speech components across multiple microphones, integrated with beamforming and post-filtering for improved separation.
Findings
Achieves state-of-the-art results on simulated and real datasets.
Generalizes well from simulated training to real array conditions.
Effective for both offline and online continuous speech separation.
Abstract
We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing
