MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation   Invariant Training

Ertu\u{g} Karamatl{\i}; Serap K{\i}rb{\i}z

arXiv:2202.03875·eess.AS·January 11, 2023

MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

Ertu\u{g} Karamatl{\i}, Serap K{\i}rb{\i}z

PDF

Open Access 1 Repo

TL;DR

This paper presents MixCycle, an unsupervised speech separation method that uses cyclic training to improve separation quality, approaching supervised performance without needing reference sources.

Contribution

The paper introduces MixCycle, a novel cyclic training framework for unsupervised speech separation that outperforms previous methods and includes a self-evaluation technique.

Findings

01

MixCycle outperforms MixIT in speech separation tasks.

02

MixCycle achieves near-supervised performance levels.

03

The self-evaluation method reliably estimates model performance.

Abstract

We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ertug/mixcycle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders