Text2midi: Generating Symbolic Music from Captions

Keshav Bhandari; Abhinaba Roy; Kyra Wang; Geeta Puri; Simon Colton; Dorien Herremans

arXiv:2412.16526·cs.SD·June 18, 2025

Text2midi: Generating Symbolic Music from Captions

Keshav Bhandari, Abhinaba Roy, Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

Text2midi is an innovative end-to-end system that uses large language models to generate MIDI music files from textual descriptions, enabling intuitive and controllable music creation.

Contribution

It introduces a novel approach combining LLMs and autoregressive transformers to generate symbolic music from text, streamlining the music composition process.

Findings

01

High-quality MIDI generation controllable by text prompts

02

Effective use of LLMs for symbolic music synthesis

03

Positive results from automated and human evaluations

Abstract

This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions. Leveraging the growing popularity of multimodal generative approaches, text2midi capitalizes on the extensive availability of textual data and the success of large language models (LLMs). Our end-to-end system harnesses the power of LLMs to generate symbolic music in the form of MIDI files. Specifically, we utilize a pretrained LLM encoder to process captions, which then condition an autoregressive transformer decoder to produce MIDI sequences that accurately reflect the provided descriptions. This intuitive and user-friendly method significantly streamlines the music creation process by allowing users to generate music pieces using text prompts. We conduct comprehensive empirical evaluations, incorporating both automated and human studies, that show our model generates MIDI files of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amaai-lab/text2midi
pytorchOfficial

Models

Videos

Text2midi: Generating Symbolic Music from Captions· underline

Taxonomy

TopicsMusic and Audio Processing · Digital Humanities and Scholarship · Natural Language Processing Techniques