MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training
Ertu\u{g} Karamatl{\i}, Serap K{\i}rb{\i}z

TL;DR
This paper presents MixCycle, an unsupervised speech separation method that uses cyclic training to improve separation quality, approaching supervised performance without needing reference sources.
Contribution
The paper introduces MixCycle, a novel cyclic training framework for unsupervised speech separation that outperforms previous methods and includes a self-evaluation technique.
Findings
MixCycle outperforms MixIT in speech separation tasks.
MixCycle achieves near-supervised performance levels.
The self-evaluation method reliably estimates model performance.
Abstract
We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders
