Simultaneous Music Separation and Generation Using Multi-Track Latent   Diffusion Models

Tornike Karchkhadze; Mohammad Rasool Izadi; Shlomo Dubnov

arXiv:2409.12346·cs.SD·December 31, 2024

Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models

Tornike Karchkhadze, Mohammad Rasool Izadi, Shlomo Dubnov

PDF

Open Access 1 Repo

TL;DR

This paper presents a latent diffusion model that simultaneously performs music source separation and multi-track music generation, enabling integrated tasks like arrangement creation with improved accuracy over existing methods.

Contribution

Introduces a novel latent diffusion-based multi-track model that unifies music separation and generation, advancing the integration of these tasks within a single framework.

Findings

01

Significant improvements in source separation metrics.

02

Effective multi-track music synthesis and arrangement generation.

03

Model trained on the Slakh2100 dataset outperforms existing methods.

Abstract

Diffusion models have recently shown strong potential in both music generation and music source separation tasks. Although in early stages, a trend is emerging towards integrating these tasks into a single framework, as both involve generating musically aligned parts and can be seen as facets of the same generative process. In this work, we introduce a latent diffusion-based multi-track generation model capable of both source separation and multi-track music synthesis by learning the joint probability distribution of tracks sharing a musical context. Our model also enables arrangement generation by creating any subset of tracks given the others. We trained our model on the Slakh2100 dataset, compared it with an existing simultaneous generation and separation model, and observed significant improvements across objective metrics for source separation, music, and arrangement generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

karchkha/msg-ld
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing