Text2Score: Generating Sheet Music From Textual Prompts
Keshav Bhandari, Sungkyun Chang, Abhinaba Roy, Francesca Ronchini, Emmanouil Benetos, Dorien Herremans, Simon Colton

TL;DR
Text2Score is a novel two-stage framework that generates sheet music from natural language prompts by leveraging symbolic data supervision and structured planning, outperforming existing methods.
Contribution
The paper introduces a new framework for text-driven sheet music generation that bypasses scarce paired data using symbolic supervision and structured planning with LLMs.
Findings
Outperforms baseline models in objective and subjective evaluations.
Introduces an evaluation framework covering multiple musical quality aspects.
Open-sources dataset, code, and evaluation tools.
Abstract
Developing text-driven symbolic music generation models remains challenging due to the scarcity of aligned text-music datasets and the unreliability of automated captioning pipelines. While most efforts have focused on MIDI, sheet music representations are largely underexplored in text-driven generation. We present Text2Score, a two-stage framework comprising a planning stage and an execution stage for generating sheet music from natural language prompts. By deriving supervision signals directly from symbolic XML data, we propose an alternative training paradigm that bypasses noisy or scarce text-music pairs. In the planning stage, an LLM orchestrator translates a natural language prompt into a structured measure-wise plan defining musical attributes such as instruments, key, time signatures, harmony, etc. This plan is then consumed by a generative model in the execution stage to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
