MusicGen-Stem: Multi-stem music generation and edition through   autoregressive modeling

Simon Rouard; Robin San Roman; Yossi Adi; Axel Roebel

arXiv:2501.01757·cs.SD·January 8, 2025

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling

Simon Rouard, Robin San Roman, Yossi Adi, Axel Roebel

PDF

TL;DR

MusicGen-Stem introduces a multi-stem autoregressive music generation model that learns dependencies between bass, drums, and other stems, enabling high-quality generation and flexible editing of individual stems within songs.

Contribution

It is the first open-source multi-stem autoregressive music model capable of both high-quality generation and coherent source editing, leveraging specialized tokenization and recent source separation techniques.

Findings

01

Effective multi-stem generation with coherent dependencies

02

Ability to edit individual stems in existing or generated music

03

Open-source release with code, models, and samples

Abstract

While most music generation models generate a mixture of stems (in mono or stereo), we propose to train a multi-stem generative model with 3 stems (bass, drums and other) that learn the musical dependencies between them. To do so, we train one specialized compression algorithm per stem to tokenize the music into parallel streams of tokens. Then, we leverage recent improvements in the task of music source separation to train a multi-stream text-to-music language model on a large dataset. Finally, thanks to a particular conditioning method, our model is able to edit bass, drums or other stems on existing or generated songs as well as doing iterative composition (e.g. generating bass on top of existing drums). This gives more flexibility in music generation algorithms and it is to the best of our knowledge the first open-source multi-stem autoregressive music generation model that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.