Simple and Controllable Music Generation

Jade Copet; Felix Kreuk; Itai Gat; Tal Remez; David Kant; Gabriel; Synnaeve; Yossi Adi; Alexandre D\'efossez

arXiv:2306.05284·cs.SD·January 31, 2024·65 cites

Simple and Controllable Music Generation

Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel, Synnaeve, Yossi Adi, Alexandre D\'efossez

PDF

Open Access 5 Repos 10 Models 2 Videos

TL;DR

MusicGen is a novel single-transformer model for conditional music generation that produces high-quality mono and stereo samples controlled by text or melodic features, outperforming baselines.

Contribution

Introduces MusicGen, a single-stage transformer model with efficient token interleaving for controllable music generation, simplifying previous multi-model approaches.

Findings

01

MusicGen outperforms baselines on a standard text-to-music benchmark.

02

The model generates high-quality mono and stereo music conditioned on text or melodic features.

03

Ablation studies highlight the importance of each component in MusicGen.

Abstract

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

New AI Listened To 20,000 Hours Of Music. What Did It Learn?· youtube

Simple and Controllable Music Generation· slideslive

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing