Few-Shot Musical Source Separation

Yu Wang; Daniel Stoller; Rachel M. Bittner; Juan Pablo Bello

arXiv:2205.01273·cs.SD·May 4, 2022

Few-Shot Musical Source Separation

Yu Wang, Daniel Stoller, Rachel M. Bittner, Juan Pablo Bello

PDF

Open Access

TL;DR

This paper introduces a few-shot learning approach for musical source separation, enabling models to generalize to unseen instruments by conditioning on a few audio examples, thus broadening the applicability of source separation models.

Contribution

We propose a novel few-shot conditioning framework for source separation that generalizes to unseen instruments using a joint training of a conditioning encoder and U-Net model.

Findings

01

Outperforms baseline models on seen and unseen instruments

02

Effective with various conditioning example characteristics

03

Generalizes well to real-world recordings

Abstract

Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning encoder jointly with the U-Net to encode the audio examples into a conditioning vector to configure the U-Net via feature-wise linear modulation (FiLM). We evaluate the trained models on real musical recordings in the MUSDB18 and MedleyDB datasets. We show that our proposed few-shot conditioning paradigm outperforms the baseline one-hot instrument-class conditioned model for both seen and unseen instruments. To extend the scope of our approach to a wider variety of real-world scenarios, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis