Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics
Jonathan Lehmkuhl, \'Abel Ily\'es-Kun, Nico Bremes, Cemhan Kaan \"Ozaltan, Frederik Muthers, Jiayi Yuan

TL;DR
This paper systematically compares various transformer models, datasets, and metrics for symbolic piano music generation, revealing how design choices impact output quality and correlating quantitative metrics with human judgments.
Contribution
It provides a comprehensive analysis of design factors affecting transformer-based music generation and evaluates the effectiveness of different metrics against human preferences.
Findings
The 950M-parameter transformer trained on 80K MIDI files produces highly human-like music.
Quantitative metrics show varying degrees of correlation with human judgment.
Model architecture and dataset diversity significantly influence generated music quality.
Abstract
Although a variety of transformers have been proposed for symbolic music generation in recent years, there is still little comprehensive study on how specific design choices affect the quality of the generated music. In this work, we systematically compare different datasets, model architectures, model sizes, and training strategies for the task of symbolic piano music generation. To support model development and evaluation, we examine a range of quantitative metrics and analyze how well they correlate with human judgment collected through listening studies. Our best-performing model, a 950M-parameter transformer trained on 80K MIDI files from diverse genres, produces outputs that are often rated as human-composed in a Turing-style listening survey.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games
