Multi-microphone Complex Spectral Mapping for Utterance-wise and   Continuous Speech Separation

Zhong-Qiu Wang; Peidong Wang; DeLiang Wang

arXiv:2010.01703·cs.SD·May 25, 2021

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation

Zhong-Qiu Wang, Peidong Wang, DeLiang Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a multi-microphone complex spectral mapping technique using deep learning for effective speaker separation and dereverberation in reverberant environments, applicable to both offline and online scenarios.

Contribution

It presents a novel deep learning approach that predicts real and imaginary speech components across multiple microphones, integrated with beamforming and post-filtering for improved separation.

Findings

01

Achieves state-of-the-art results on simulated and real datasets.

02

Generalizes well from simulated training to real array conditions.

03

Effective for both offline and online continuous speech separation.

Abstract

We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing