End-to-end music source separation: is it possible in the waveform   domain?

Francesc Llu\'is; Jordi Pons; Xavier Serra

arXiv:1810.12187·cs.SD·July 1, 2019

End-to-end music source separation: is it possible in the waveform domain?

Francesc Llu\'is, Jordi Pons, Xavier Serra

PDF

2 Repos

TL;DR

This paper investigates the feasibility of end-to-end waveform-based music source separation models, demonstrating that they can match or surpass spectrogram-based models by utilizing all raw audio information including phase.

Contribution

The study introduces and evaluates waveform-based models like Wavenet and Wave-U-Net, showing their competitive performance against traditional spectrogram-based approaches.

Findings

01

Waveform models can outperform spectrogram-based models.

02

End-to-end models effectively utilize phase information.

03

Wave-U-Net and Wavenet achieve comparable or better results.

Abstract

Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase. To avoid omitting potentially useful information, we study the viability of using end-to-end models for music source separation --- which take into account all the information available in the raw audio signal, including the phase. Although during the last decades end-to-end music source separation has been considered almost unattainable, our results confirm that waveform-based models can perform similarly (if not better) than a spectrogram-based deep learning model. Namely: a Wavenet-based model we propose and Wave-U-Net can outperform DeepConvSep, a recent spectrogram-based deep learning model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.