Toward Speech Separation in The Pre-Cocktail Party Problem with TasTas

Ziqiang Shi; Jiqing Han

arXiv:2009.03692·eess.AS·June 26, 2023

Toward Speech Separation in The Pre-Cocktail Party Problem with TasTas

Ziqiang Shi, Jiqing Han

PDF

Open Access 1 Repo

TL;DR

This paper explores using TasTas for monaural speech separation in the pre-cocktail party problem, achieving significant SDR improvements on WSJ0-5mix data, with open-source implementation for reproducibility.

Contribution

It introduces TasTas for end-to-end speech separation and demonstrates its effectiveness with open-source code, advancing monaural separation techniques.

Findings

01

10.41dB SDR improvement on WSJ0-5mix

02

11.14dB SDR with online data augmentation

03

Open-source implementation of DPRNN-TasNet and TasTas

Abstract

In this note, we propose to use TasTas \cite{shi2020speech} for the end-to-end approach to monaural speech separation in the pre-cocktail party problem. Our experiments on the public WSJ0-5mix data corpus results in 10.41dB SDR improvement. If online voice data remixing augmentation \cite{zeghidour2020wavesplit} is adopted in training, an 11.14dB SDR improvement can be achieved. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing