Deep attractor network for single-microphone speaker separation

Zhuo Chen; Yi Luo; Nima Mesgarani

arXiv:1611.08930·cs.SD·November 30, 2017

Deep attractor network for single-microphone speaker separation

Zhuo Chen, Yi Luo, Nima Mesgarani

PDF

1 Repo

TL;DR

This paper introduces a deep learning framework using attractor points in embedding space for single-microphone speaker separation, effectively handling arbitrary source permutations and unknown source counts.

Contribution

It presents an end-to-end trainable model that does not depend on source number and introduces real-time feasible attractor strategies for speech separation.

Findings

01

Achieved 5.49% improvement over state-of-the-art methods.

02

Proposed a permutation-invariant separation approach.

03

Validated on Wall Street Journal dataset.

Abstract

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KMASAHIRO/DANet
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.