Music Source Separation in the Waveform Domain

Alexandre D\'efossez (FAIR; SIERRA; PSL); Nicolas Usunier (FAIR),; L\'eon Bottou (FAIR); Francis Bach (DI-ENS; PSL; SIERRA)

arXiv:1911.13254·cs.SD·April 29, 2021·184 cites

Music Source Separation in the Waveform Domain

Alexandre D\'efossez (FAIR, SIERRA, PSL), Nicolas Usunier (FAIR),, L\'eon Bottou (FAIR), Francis Bach (DI-ENS, PSL, SIERRA)

PDF

Open Access 1 Repo

TL;DR

This paper introduces Demucs, a novel waveform-to-waveform music source separation model with a U-Net and LSTM, outperforming existing methods in accuracy and naturalness, and capable of being efficiently compressed.

Contribution

Demucs is a new waveform domain model for music source separation that surpasses state-of-the-art spectrogram-based methods and Conv-Tasnet, with improved naturalness and efficiency.

Findings

01

Demucs achieves 6.3 SDR on MusDB, surpassing previous methods.

02

Proper data augmentation enhances Demucs performance.

03

Demucs can be compressed to 120MB without accuracy loss.

Abstract

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we compare two waveform domain architectures. We first adapt Conv-Tasnet, initially developed for speech source separation,to the task of music source separation. While Conv-Tasnet beats many existing spectrogram-domain methods, it suffersfrom significant artifacts, as shown by human evaluations. We propose instead Demucs, a novel waveform-to-waveform model,with a U-Net structure and bidirectional LSTM.Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/demucs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis