Multi-channel Speech Separation Using Deep Embedding Model with   Multilayer Bootstrap Networks

Ziye Yang; Xiao-Lei Zhang

arXiv:1910.10912·cs.SD·October 25, 2019·1 cites

Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks

Ziye Yang, Xiao-Lei Zhang

PDF

Open Access

TL;DR

This paper introduces DPCL++, an improved deep clustering method for speech separation that employs multilayer bootstrap networks to enhance robustness in reverberant environments, especially when training and testing conditions differ.

Contribution

The paper proposes integrating multilayer bootstrap networks into deep clustering to reduce noise and variations in embeddings, improving speech separation in challenging environments.

Findings

01

Enhanced separation accuracy in reverberant environments

02

Robustness to environment mismatch demonstrated

03

Effective noise reduction in embedding vectors

Abstract

Recently, deep clustering (DPCL) based speaker-independent speech separation has drawn much attention, since it needs little speaker prior information. However, it still has much room of improvement, particularly in reverberant environments. If the training and test environments mismatch which is a common case, the embedding vectors produced by DPCL may contain much noise and many small variations. To deal with the problem, we propose a variant of DPCL, named DPCL++, by applying a recent unsupervised deep learning method---multilayer bootstrap networks(MBN)---to further reduce the noise and small variations of the embedding vectors in an unsupervised way in the test stage, which fascinates k-means to produce a good result. MBN builds a gradually narrowed network from bottom-up via a stack of k-centroids clustering ensembles, where the k-centroids clusterings are trained independently by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsTest