Unsupervised Sound Separation Using Mixture Invariant Training

Scott Wisdom; Efthymios Tzinis; Hakan Erdogan; Ron J. Weiss; and Kevin Wilson; John R. Hershey

arXiv:2006.12701·eess.AS·October 27, 2020·91 cites

Unsupervised Sound Separation Using Mixture Invariant Training

Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, and Kevin Wilson, John R. Hershey

PDF

Open Access 1 Video

TL;DR

This paper introduces MixIT, an unsupervised training method for sound separation that learns from real-world mixtures without needing ground-truth sources, enabling better domain adaptation and performance in diverse acoustic conditions.

Contribution

The paper presents MixIT, a novel unsupervised training approach for sound separation that operates solely on mixtures, reducing reliance on synthetic data and ground-truth sources.

Findings

01

Achieves competitive speech separation performance without supervised data.

02

Enables effective domain adaptation using real-world mixtures.

03

Improves reverberant speech separation and sound separation with large in-the-wild datasets.

Abstract

In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. Reliance on this synthetic training data is problematic because good performance depends upon the degree of match between the training data and real-world audio, especially in terms of the acoustic conditions and distribution of sources. The acoustic properties can be challenging to accurately simulate, and the distribution of sound types may be hard to replicate. In this paper, we propose a completely unsupervised method, mixture invariant training (MixIT), that requires only single-channel acoustic mixtures. In MixIT, training examples are constructed by mixing together existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unsupervised Sound Separation Using Mixture Invariant Training· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis