MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling
Simon Rouard, Robin San Roman, Yossi Adi, Axel Roebel

TL;DR
MusicGen-Stem introduces a multi-stem autoregressive music generation model that learns dependencies between bass, drums, and other stems, enabling high-quality generation and flexible editing of individual stems within songs.
Contribution
It is the first open-source multi-stem autoregressive music model capable of both high-quality generation and coherent source editing, leveraging specialized tokenization and recent source separation techniques.
Findings
Effective multi-stem generation with coherent dependencies
Ability to edit individual stems in existing or generated music
Open-source release with code, models, and samples
Abstract
While most music generation models generate a mixture of stems (in mono or stereo), we propose to train a multi-stem generative model with 3 stems (bass, drums and other) that learn the musical dependencies between them. To do so, we train one specialized compression algorithm per stem to tokenize the music into parallel streams of tokens. Then, we leverage recent improvements in the task of music source separation to train a multi-stream text-to-music language model on a large dataset. Finally, thanks to a particular conditioning method, our model is able to edit bass, drums or other stems on existing or generated songs as well as doing iterative composition (e.g. generating bass on top of existing drums). This gives more flexibility in music generation algorithms and it is to the best of our knowledge the first open-source multi-stem autoregressive music generation model that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
