Orthonormal Embedding-based Deep Clustering for Single-channel Speech   Separation

Soyeon Choe; Soo-Whan Chung; Youna Ji; Hong-Goo Kang

arXiv:1901.04690·eess.AS·January 16, 2019·1 cites

Orthonormal Embedding-based Deep Clustering for Single-channel Speech Separation

Soyeon Choe, Soo-Whan Chung, Youna Ji, Hong-Goo Kang

PDF

Open Access

TL;DR

This paper introduces an enhanced deep clustering method with a regularization term that reduces correlation among embeddings, leading to improved speech separation performance in single-channel scenarios.

Contribution

It proposes a novel regularization term for deep clustering that mitigates permutation issues and enhances spectral bin decomposition for better speech separation.

Findings

01

Outperforms conventional deep clustering in SDR metrics

02

Effective across varying embedding dimensions and SIR levels

03

Reduces permutation problems in source separation

Abstract

Deep clustering is a deep neural network-based speech separation algorithm that first trains the mixed component of signals with high-dimensional embeddings, and then uses a clustering algorithm to separate each mixture of sources. In this paper, we extend the baseline criterion of deep clustering with an additional regularization term to further improve the overall performance. This term plays a role in assigning a condition to the embeddings such that it gives less correlation to each embedding dimension, leading to better decomposition of the spectral bins. The regularization term helps to mitigate the unavoidable permutation problem in the conventional deep clustering method, which enables to bring better clustering through the formation of optimal embeddings. We evaluate the results by varying embedding dimension, signal-to-interference ratio (SIR), and gender dependency. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques