MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music   Generation

Yun-Han Lan; Wen-Yi Hsiao; Hao-Chung Cheng; Yi-Hsuan Yang

arXiv:2407.15060·cs.SD·July 23, 2024·1 cites

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Yun-Han Lan, Wen-Yi Hsiao, Hao-Chung Cheng, Yi-Hsuan Yang

PDF

Open Access 2 Repos

TL;DR

MusiConGen is a Transformer-based text-to-music model that enables precise control over rhythm and chords during music generation by incorporating automatically-extracted or user-defined musical features, improving controllability and realism.

Contribution

The paper introduces MusiConGen, a novel fine-tuning approach for Transformer-based text-to-music models that allows explicit control over musical features like rhythm and chords.

Findings

01

MusiConGen produces realistic backing tracks aligned with specified musical features.

02

The model effectively integrates reference audio or user-defined inputs for controlled music generation.

03

Open-source code and demos are provided for community use and validation.

Abstract

Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted rhythm and chords as the condition signal. During inference, the condition can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets -- one derived from extracted features and the other from user-created inputs -- demonstrates that MusiConGen can generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Human Motion and Animation