Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model
Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag,, Shlomo Dubnov

TL;DR
This paper introduces Multi-Track MusicLDM, a latent diffusion model that generates and arranges multi-track music with improved coherence and control, advancing the capabilities of AI in complex music composition tasks.
Contribution
The paper extends MusicLDM into a multi-track model capable of generating and arranging music with multiple tracks, offering better coherence and control than previous models.
Findings
Significantly improved objective metrics for multi-track music generation
Effective arrangement generation by predicting missing tracks given others
Enhanced coherence among generated tracks in multi-track compositions
Abstract
Diffusion models have shown promising results in cross-modal generation tasks involving audio and music, such as text-to-sound and text-to-music generation. These text-controlled music generation models typically focus on generating music by capturing global musical attributes like genre and mood. However, music composition is a complex, multilayered task that often involves musical arrangement as an integral part of the process. This process involves composing each instrument to align with existing ones in terms of beat, dynamics, harmony, and melody, requiring greater precision and control over tracks than text prompts usually provide. In this work, we address these challenges by extending the MusicLDM, a latent diffusion model for music, into a multi-track generative model. By learning the joint probability of tracks sharing a context, our model is capable of generating music across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
