Scaling Self-Supervised Representation Learning for Symbolic Piano Performance
Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton

TL;DR
This paper demonstrates that large-scale autoregressive transformer models trained on symbolic piano data can generate coherent music, produce high-quality embeddings for classification, and adapt efficiently to downstream tasks with minimal labeled data.
Contribution
It introduces a large-scale pretraining and finetuning framework for symbolic piano music, combining generative and contrastive learning approaches for improved performance.
Findings
Outperforms existing symbolic music generation methods in coherence
Achieves state-of-the-art MIR classification results with frozen embeddings
Requires only a few hundred labeled examples for effective downstream adaptation
Abstract
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
