Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation
Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R., Hershey

TL;DR
This paper introduces novel loss functions and an efficient training method for unsupervised sound separation, addressing over-separation and computational challenges in mixture invariant training, leading to improved separation performance on in-the-wild data.
Contribution
It proposes sparsity and covariance losses to reduce over-separation and an efficient approximation for large source numbers in MixIT, enhancing unsupervised sound separation.
Findings
Achieved over 13 dB SI-SNR improvement on FUSS test set.
Boosted single-source SI-SNR by over 17 dB.
Demonstrated effectiveness of proposed losses and efficient MixIT in real-world data.
Abstract
Supervised neural network training has led to significant progress on single-channel sound separation. This approach relies on ground truth isolated sources, which precludes scaling to widely available mixture data and limits progress on open-domain tasks. The recent mixture invariant training (MixIT) method enables training on in-the-wild data; however, it suffers from two outstanding problems. First, it produces models which tend to over-separate, producing more output sources than are present in the input. Second, the exponential computational complexity of the MixIT loss limits the number of feasible output sources. In this paper we address both issues. To combat over-separation we introduce new losses: sparsity losses that favor fewer output sources and a covariance loss that discourages correlated outputs. We also experiment with a semantic classification loss by predicting weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing
