Multi-Track MusicLDM: Towards Versatile Music Generation with Latent   Diffusion Model

Tornike Karchkhadze; Mohammad Rasool Izadi; Ke Chen; Gerard Assayag,; Shlomo Dubnov

arXiv:2409.02845·cs.SD·October 24, 2024

Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model

Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag,, Shlomo Dubnov

PDF

Open Access

TL;DR

This paper introduces Multi-Track MusicLDM, a latent diffusion model that generates and arranges multi-track music with improved coherence and control, advancing the capabilities of AI in complex music composition tasks.

Contribution

The paper extends MusicLDM into a multi-track model capable of generating and arranging music with multiple tracks, offering better coherence and control than previous models.

Findings

01

Significantly improved objective metrics for multi-track music generation

02

Effective arrangement generation by predicting missing tracks given others

03

Enhanced coherence among generated tracks in multi-track compositions

Abstract

Diffusion models have shown promising results in cross-modal generation tasks involving audio and music, such as text-to-sound and text-to-music generation. These text-controlled music generation models typically focus on generating music by capturing global musical attributes like genre and mood. However, music composition is a complex, multilayered task that often involves musical arrangement as an integral part of the process. This process involves composing each instrument to align with existing ones in terms of beat, dynamics, harmony, and melody, requiring greater precision and control over tracks than text prompts usually provide. In this work, we address these challenges by extending the MusicLDM, a latent diffusion model for music, into a multi-track generative model. By learning the joint probability of tracks sharing a context, our model is capable of generating music across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis