A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments
Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wenjing Zhu

TL;DR
This paper introduces a multi-stage triple-path method that enhances speech separation in noisy and reverberant environments by decomposing the task into denoising, separation, and de-reverberation, utilizing a triple-path structure for better channel modeling.
Contribution
It presents a novel multi-stage end-to-end approach with a triple-path structure specifically designed for challenging noisy and reverberant conditions, improving performance with minimal parameter increase.
Findings
Improved speech separation performance in noisy environments.
Effective modeling of channel information with the triple-path structure.
Achieved better results with minimal increase in model complexity.
Abstract
In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech denoising, separation, and de-reverberation. The probability and speed of searching for the optimal solution of the speech separation model are improved by reducing the solution space. Moreover, since the channel information of the audio sequence in the time domain is crucial for speech separation, we propose a triple-path structure capable of modeling the channel dimension of audio sequences. Experimental results show that the proposed multi-stage triple-path method can improve the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
