Source Separation of Multi-source Raw Music using a Residual Quantized Variational Autoencoder
Leonardo Berti

TL;DR
This paper introduces a neural audio codec based on residual quantized variational autoencoders for musical source separation, achieving near state-of-the-art results with reduced computational requirements.
Contribution
The paper presents a novel residual quantized variational autoencoder architecture for source separation, trained on multi-track music data, with improved efficiency and competitive performance.
Findings
Achieves near state-of-the-art separation results
Requires less computational power than comparable models
Code is publicly available for reproducibility
Abstract
I developed a neural audio codec model based on the residual quantized variational autoencoder architecture. I train the model on the Slakh2100 dataset, a standard dataset for musical source separation, composed of multi-track audio. The model can separate audio sources, achieving almost SoTA results with much less computing power. The code is publicly available at github.com/LeonardoBerti00/Source-Separation-of-Multi-source-Music-using-Residual-Quantizad-Variational-Autoencoder
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
