Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Zhuohuang Zhang; Yong Xu; Meng Yu; Shi-Xiong Zhang; Lianwu Chen,; Donald S. Williamson; Dong Yu

arXiv:2012.13442·eess.AS·November 17, 2021·5 cites

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen,, Donald S. Williamson, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a multi-channel, multi-frame deep learning based MVDR approach for target speech separation that effectively reduces distortions and improves ASR performance, outperforming existing methods.

Contribution

The paper presents a novel MCMF ADL-MVDR system that extends previous work to better handle distortions and utilize spatio-temporal correlations for speech separation.

Findings

01

Outperforms state-of-the-art methods on Mandarin audio-visual corpus

02

Reduces residual noise and distortion in separated speech

03

Improves automatic speech recognition accuracy

Abstract

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis