Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain
Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, and Dong Yu

TL;DR
This paper introduces a novel complex neural spatial filter (cNSF) for multi-channel target speech separation that directly estimates the complex ratio mask in the complex domain, leveraging complex-valued features and a U-Net structure.
Contribution
The study proposes a new complex-valued neural network model with a U-Net structure for direct complex domain cRM estimation, improving speech separation performance.
Findings
cNSF outperforms baseline NSF by 12.1% in SI-SDR
Achieves 33.1% reduction in word error rate
Effectively exploits complex-valued features for speech separation
Abstract
To date, mainstream target speech separation (TSS) approaches are formulated to estimate the complex ratio mask (cRM) of the target speech in time-frequency domain under supervised deep learning framework. However, the existing deep models for estimating cRM are designed in the way that the real and imaginary parts of the cRM are separately modeled using real-valued training data pairs. The research motivation of this study is to design a deep model that fully exploits the temporal-spectral-spatial information of multi-channel signals for estimating cRM directly and efficiently in complex domain. As a result, a novel TSS network is designed consisting of two modules, a complex neural spatial filter (cNSF) and an MVDR. Essentially, cNSF is a cRM estimation model and an MVDR module is cascaded to the cNSF module to reduce the nonlinear speech distortions introduced by neural network.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
