Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech   Separation in Complex Domain

Rongzhi Gu; Shi-Xiong Zhang; Yuexian Zou; and Dong Yu

arXiv:2104.12359·cs.SD·September 8, 2021

Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, and Dong Yu

PDF

TL;DR

This paper introduces a novel complex neural spatial filter (cNSF) for multi-channel target speech separation that directly estimates the complex ratio mask in the complex domain, leveraging complex-valued features and a U-Net structure.

Contribution

The study proposes a new complex-valued neural network model with a U-Net structure for direct complex domain cRM estimation, improving speech separation performance.

Findings

01

cNSF outperforms baseline NSF by 12.1% in SI-SDR

02

Achieves 33.1% reduction in word error rate

03

Effectively exploits complex-valued features for speech separation

Abstract

To date, mainstream target speech separation (TSS) approaches are formulated to estimate the complex ratio mask (cRM) of the target speech in time-frequency domain under supervised deep learning framework. However, the existing deep models for estimating cRM are designed in the way that the real and imaginary parts of the cRM are separately modeled using real-valued training data pairs. The research motivation of this study is to design a deep model that fully exploits the temporal-spectral-spatial information of multi-channel signals for estimating cRM directly and efficiently in complex domain. As a result, a novel TSS network is designed consisting of two modules, a complex neural spatial filter (cNSF) and an MVDR. Essentially, cNSF is a cRM estimation model and an MVDR module is cascaded to the cNSF module to reduce the nonlinear speech distortions introduced by neural network.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.