Hybrid Spectrogram and Waveform Source Separation

Alexandre D\'efossez

arXiv:2111.03600·eess.AS·August 31, 2022·71 cites

Hybrid Spectrogram and Waveform Source Separation

Alexandre D\'efossez

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hybrid source separation model combining spectrogram and waveform approaches, achieving state-of-the-art results in music demixing with improved SDR and subjective quality.

Contribution

It presents an end-to-end hybrid Demucs architecture that adaptively combines spectrogram and waveform processing, winning the 2021 Music Demixing Challenge.

Findings

01

1.4 dB SDR improvement over previous models

02

Higher subjective quality ratings in human evaluations

03

Effective integration of multiple enhancements like local attention

Abstract

Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/demucs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques