Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
Kohei Saijo, Yoshiaki Bando

TL;DR
This paper explores the potential of MixIT, an unsupervised learning method, for pre-training in music source separation, showing it can improve performance despite initial assumptions about its limitations with correlated sources.
Contribution
The study demonstrates that MixIT can be effectively used for unsupervised pre-training in MSS, challenging the belief that it is unsuitable due to source correlation issues.
Findings
MixIT pre-training improves MSS performance over training from scratch.
Pre-training on unlabeled data with MixIT enhances fine-tuned MSS models.
MixIT can separate instruments to some extent despite source correlation challenges.
Abstract
In music source separation (MSS), obtaining isolated sources or stems is highly costly, making pre-training on unlabeled data a promising approach. Although source-agnostic unsupervised learning like mixture-invariant training (MixIT) has been explored in general sound separation, they have been largely overlooked in MSS due to its implicit assumption of source independence. We hypothesize, however, that the difficulty of applying MixIT to MSS arises from the ill-posed nature of MSS itself, where stem definitions are application-dependent and models lack explicit knowledge of what should or should not be separated, rather than from high inter-source correlation. While MixIT does not assume any source model and struggles with such ambiguities, our preliminary experiments show that it can still separate instruments to some extent, suggesting its potential for unsupervised pre-training.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
