Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
Alexandre D\'efossez (SIERRA, PSL, FAIR), Nicolas Usunier (FAIR),, L\'eon Bottou (FAIR), Francis Bach (PSL, DI-ENS, SIERRA)

TL;DR
This paper introduces a new deep learning model for music source separation that outperforms previous waveform-based methods and leverages unlabeled data through a novel remixing scheme, closing the gap with spectrogram-based approaches.
Contribution
It presents a simple convolutional-recurrent waveform model that surpasses state-of-the-art and a novel semi-supervised training scheme using unlabeled music tracks.
Findings
Wave-U-Net outperforms previous waveform models by 1.6 SDR points.
The semi-supervised scheme improves separation quality.
Waveform methods can match spectrogram-based approaches.
Abstract
We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurrent model that outperforms the state-of-the-art model on waveforms, that is, Wave-U-Net, by 1.6 points of SDR (signal to distortion ratio). (ii) We propose a new scheme to leverage unlabeled music. We train a first model to extract parts with at least one source silent in unlabeled tracks, for instance without bass. We remix this extract with a bass line taken from the supervised dataset to form a new weakly supervised training example. Combining our architecture and scheme, we show that waveform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
