Single-Channel Multi-Speaker Separation using Deep Clustering

Yusuf Isik; Jonathan Le Roux; Zhuo Chen; Shinji Watanabe; John R.; Hershey

arXiv:1607.02173·cs.LG·July 11, 2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R., Hershey

PDF

2 Repos

TL;DR

This paper advances multi-speaker speech separation by integrating deep clustering with end-to-end signal approximation, significantly improving SDR and WER, and enabling more effective separation in cocktail party scenarios.

Contribution

It introduces an end-to-end training framework with a new signal approximation objective that enhances deep clustering for multi-speaker separation.

Findings

01

SDR improved from 6.0 dB to 10.3 dB for two speakers

02

WER reduced from 89.1% to 30.8% with the new method

03

Enhanced model achieves state-of-the-art separation performance

Abstract

Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.