End-to-End Multi-Channel Speech Separation

Rongzhi Gu; Jian Wu; Shi-Xiong Zhang; Lianwu Chen; Yong Xu; Meng Yu,; Dan Su; Yuexian Zou; Dong Yu

arXiv:1905.06286·cs.SD·May 29, 2019·80 cites

End-to-End Multi-Channel Speech Separation

Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu,, Dan Su, Yuexian Zou, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a novel end-to-end multi-channel speech separation model that integrates waveform processing, reformulates traditional spatial features as learnable convolutions, and demonstrates significant performance improvements on a standard dataset.

Contribution

It proposes a fully data-driven, end-to-end neural network architecture for multi-channel speech separation, incorporating learnable spatial features and reformulated traditional methods.

Findings

01

Significant performance improvement over previous methods

02

Effective integration of spatial features as learnable components

03

End-to-end training from waveform input to output

Abstract

The end-to-end approach for single-channel speech separation has been studied recently and shown promising results. This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation. The primary contributions of this work include 1) an integrated waveform-in waveform-out separation system in a single neural network architecture. 2) We reformulate the traditional short time Fourier transform (STFT) and inter-channel phase difference (IPD) as a function of time-domain convolution with a special kernel. 3) We further relaxed those fixed kernels to be learnable, so that the entire architecture becomes purely data-driven and can be trained from end-to-end. We demonstrate on the WSJ0 far-field speech separation task that, with the benefit of learnable spatial features, our proposed end-to-end multi-channel model significantly improved the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution