Few-Shot Musical Source Separation
Yu Wang, Daniel Stoller, Rachel M. Bittner, Juan Pablo Bello

TL;DR
This paper introduces a few-shot learning approach for musical source separation, enabling models to generalize to unseen instruments by conditioning on a few audio examples, thus broadening the applicability of source separation models.
Contribution
We propose a novel few-shot conditioning framework for source separation that generalizes to unseen instruments using a joint training of a conditioning encoder and U-Net model.
Findings
Outperforms baseline models on seen and unseen instruments
Effective with various conditioning example characteristics
Generalizes well to real-world recordings
Abstract
Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning encoder jointly with the U-Net to encode the audio examples into a conditioning vector to configure the U-Net via feature-wise linear modulation (FiLM). We evaluate the trained models on real musical recordings in the MUSDB18 and MedleyDB datasets. We show that our proposed few-shot conditioning paradigm outperforms the baseline one-hot instrument-class conditioned model for both seen and unseen instruments. To extend the scope of our approach to a wider variety of real-world scenarios, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
