A Multi-Stage Triple-Path Method for Speech Separation in Noisy and   Reverberant Environments

Zhaoxi Mu; Xinyu Yang; Xiangyuan Yang; Wenjing Zhu

arXiv:2303.03732·cs.SD·March 8, 2023·1 cites

A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wenjing Zhu

PDF

Open Access

TL;DR

This paper introduces a multi-stage triple-path method that enhances speech separation in noisy and reverberant environments by decomposing the task into denoising, separation, and de-reverberation, utilizing a triple-path structure for better channel modeling.

Contribution

It presents a novel multi-stage end-to-end approach with a triple-path structure specifically designed for challenging noisy and reverberant conditions, improving performance with minimal parameter increase.

Findings

01

Improved speech separation performance in noisy environments.

02

Effective modeling of channel information with the triple-path structure.

03

Achieved better results with minimal increase in model complexity.

Abstract

In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech denoising, separation, and de-reverberation. The probability and speed of searching for the optimal solution of the speech separation model are improved by reducing the solution space. Moreover, since the channel information of the audio sequence in the time domain is crucial for speech separation, we propose a triple-path structure capable of modeling the channel dimension of audio sequences. Experimental results show that the proposed multi-stage triple-path method can improve the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings