Demucs: Deep Extractor for Music Sources with extra unlabeled data   remixed

Alexandre D\'efossez (SIERRA; PSL; FAIR); Nicolas Usunier (FAIR),; L\'eon Bottou (FAIR); Francis Bach (PSL; DI-ENS; SIERRA)

arXiv:1909.01174·cs.SD·September 4, 2019·57 cites

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

Alexandre D\'efossez (SIERRA, PSL, FAIR), Nicolas Usunier (FAIR),, L\'eon Bottou (FAIR), Francis Bach (PSL, DI-ENS, SIERRA)

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new deep learning model for music source separation that outperforms previous waveform-based methods and leverages unlabeled data through a novel remixing scheme, closing the gap with spectrogram-based approaches.

Contribution

It presents a simple convolutional-recurrent waveform model that surpasses state-of-the-art and a novel semi-supervised training scheme using unlabeled music tracks.

Findings

01

Wave-U-Net outperforms previous waveform models by 1.6 SDR points.

02

The semi-supervised scheme improves separation quality.

03

Waveform methods can match spectrogram-based approaches.

Abstract

We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurrent model that outperforms the state-of-the-art model on waveforms, that is, Wave-U-Net, by 1.6 points of SDR (signal to distortion ratio). (ii) We propose a new scheme to leverage unlabeled music. We train a first model to extract parts with at least one source silent in unlabeled tracks, for instance without bass. We remix this extract with a bass line taken from the supervised dataset to form a new weakly supervised training example. Combining our architecture and scheme, we show that waveform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/demucs
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis