Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM   with Auxiliary Identity Loss

Ziqiang Shi; Rujie Liu; Jiqing Han

arXiv:2008.03149·eess.AS·August 10, 2020·1 cites

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Ziqiang Shi, Rujie Liu, Jiqing Han

PDF

Open Access 1 Repo

TL;DR

This paper introduces TasTas, a novel multi-stage dual-path BiLSTM network with auxiliary identity loss, achieving state-of-the-art results in monaural speech separation by iteratively refining separated signals and enforcing speaker identity consistency.

Contribution

The work extends dual-path BiLSTM with multi-stage refinement and identity loss, significantly improving speech separation performance on benchmark datasets.

Findings

01

Achieved 20.55dB SDR improvement on WSJ0-2mix

02

Attained 20.35dB SI-SDR improvement

03

Reached 94.86% ESTOI accuracy

Abstract

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation. This work investigates how to extend dual-path BiLSTM to result in a new state-of-the-art approach, called TasTas, for multi-talker monaural speech separation (a.k.a cocktail party problem). TasTas introduces two simple but effective improvements, one is an iterative multi-stage refinement scheme, and the other is to correct the speech with imperfect separation through a loss of speaker identity consistency between the separated speech and original speech, to boost the performance of dual-path BiLSTM based networks. TasTas takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. Our experiments on the notable benchmark WSJ0-2mix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing