Toward Speech Separation in The Pre-Cocktail Party Problem with TasTas
Ziqiang Shi, Jiqing Han

TL;DR
This paper explores using TasTas for monaural speech separation in the pre-cocktail party problem, achieving significant SDR improvements on WSJ0-5mix data, with open-source implementation for reproducibility.
Contribution
It introduces TasTas for end-to-end speech separation and demonstrates its effectiveness with open-source code, advancing monaural separation techniques.
Findings
10.41dB SDR improvement on WSJ0-5mix
11.14dB SDR with online data augmentation
Open-source implementation of DPRNN-TasNet and TasTas
Abstract
In this note, we propose to use TasTas \cite{shi2020speech} for the end-to-end approach to monaural speech separation in the pre-cocktail party problem. Our experiments on the public WSJ0-5mix data corpus results in 10.41dB SDR improvement. If online voice data remixing augmentation \cite{zeghidour2020wavesplit} is adopted in training, an 11.14dB SDR improvement can be achieved. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
