Heterogeneous Separation Consistency Training for Adaptation of   Unsupervised Speech Separation

Jiangyu Han; Yanhua Long

arXiv:2204.11032·eess.AS·August 9, 2022

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Jiangyu Han, Yanhua Long

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised speech separation training method called SCT that leverages heterogeneous models and real-world unlabeled data to improve separation performance in real scenarios.

Contribution

The study proposes a heterogeneous separation consistency training framework that iteratively refines models using pseudo labels from real mixtures and cross-knowledge adaptation.

Findings

01

Improved separation accuracy on real-world speech mixtures.

02

Effective use of unlabeled data with pseudo labeling.

03

Slight performance gains from linear fusion of heterogeneous outputs.

Abstract

Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic datasets. This ground-truth reliance is problematic, because the ground-truth signals are usually unavailable in real conditions. Moreover, in many industry scenarios, the real acoustic characteristics deviate far from the ones in simulated datasets. Therefore, the performance usually degrades significantly when applying the supervised speech separation models to real applications. To address these problems, in this study, we propose a novel separation consistency training, termed SCT, to exploit the real-world unlabeled mixtures for improving cross-domain unsupervised speech separation in an iterative manner, by leveraging upon the complementary information obtained from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing