MIDI-GPT: A Controllable Generative Model for Computer-Assisted   Multitrack Music Composition

Philippe Pasquier; Jeff Ens; Nathan Fradet; Paul Triana; Davide; Rizzotti; Jean-Baptiste Rolland; Maryam Safi

arXiv:2501.17011·cs.SD·February 5, 2025

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Philippe Pasquier, Jeff Ens, Nathan Fradet, Paul Triana, Davide, Rizzotti, Jean-Baptiste Rolland, Maryam Safi

PDF

Open Access

TL;DR

MIDI-GPT is a Transformer-based generative model for multitrack music composition that allows controllable, attribute-conditioned infilling and generates stylistically consistent music without copying training data.

Contribution

It introduces a novel musical representation and attribute control mechanisms for Transformer-based music generation, enabling flexible, high-quality composition workflows.

Findings

01

MIDI-GPT avoids copying training material.

02

It generates music stylistically similar to training data.

03

Attribute controls effectively constrain generated music.

Abstract

We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception

MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer