Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Weihan Xu; Julian McAuley; Taylor Berg-Kirkpatrick; Shlomo Dubnov; Hao-Wen Dong

arXiv:2410.02084·cs.SD·June 17, 2025

Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Hao-Wen Dong

PDF

Open Access

TL;DR

This paper introduces MetaScore, a large symbolic music dataset with rich metadata, and employs an LLM to generate pseudo captions for training models that convert natural language prompts into symbolic music, enabling more controllable music generation.

Contribution

The work presents MetaScore, a novel large-scale symbolic music dataset with metadata, and demonstrates how LLM-generated captions can improve text-to-music models for controllable generation.

Findings

01

Models outperform baseline in listening tests

02

Text-to-music offers more natural user interface

03

Comparable performance to concurrent Text2MIDI work

Abstract

Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, symbolic-domain controllable music generation has lagged behind partly due to the lack of a large-scale symbolic music dataset with extensive metadata and captions. In this work, we present MetaScore, a new dataset consisting of 963K musical scores paired with rich metadata, including free-form user-annotated tags, collected from an online music forum. To approach text-to-music generation, We employ a pretrained large language model (LLM) to generate pseudo-natural language captions for music from its metadata tags. With the LLM-enhanced MetaScore, we train a text-conditioned music generation model that learns to generate symbolic music from the pseudo captions, allowing control of instruments, genre, composer, complexity and other free-form…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies