Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Louis Bradshaw; Honglu Fan; Alexander Spangher; Stella Biderman; Simon Colton

arXiv:2506.23869·cs.SD·July 1, 2025

Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton

PDF

Open Access 1 Models

TL;DR

This paper demonstrates that large-scale autoregressive transformer models trained on symbolic piano data can generate coherent music, produce high-quality embeddings for classification, and adapt efficiently to downstream tasks with minimal labeled data.

Contribution

It introduces a large-scale pretraining and finetuning framework for symbolic piano music, combining generative and contrastive learning approaches for improved performance.

Findings

01

Outperforms existing symbolic music generation methods in coherence

02

Achieves state-of-the-art MIR classification results with frozen embeddings

03

Requires only a few hundred labeled examples for effective downstream adaptation

Abstract

We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
loubb/aria-medium-base
model· 350 dl· ♡ 11
350 dl♡ 11

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis