TL;DR
This paper investigates the feasibility of end-to-end waveform-based music source separation models, demonstrating that they can match or surpass spectrogram-based models by utilizing all raw audio information including phase.
Contribution
The study introduces and evaluates waveform-based models like Wavenet and Wave-U-Net, showing their competitive performance against traditional spectrogram-based approaches.
Findings
Waveform models can outperform spectrogram-based models.
End-to-end models effectively utilize phase information.
Wave-U-Net and Wavenet achieve comparable or better results.
Abstract
Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase. To avoid omitting potentially useful information, we study the viability of using end-to-end models for music source separation --- which take into account all the information available in the raw audio signal, including the phase. Although during the last decades end-to-end music source separation has been considered almost unattainable, our results confirm that waveform-based models can perform similarly (if not better) than a spectrogram-based deep learning model. Namely: a Wavenet-based model we propose and Wave-U-Net can outperform DeepConvSep, a recent spectrogram-based deep learning model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
