Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter; Sebastian Murgul; Michael Heizmann

arXiv:2604.22290·cs.SD·April 27, 2026

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter, Sebastian Murgul, Michael Heizmann

PDF

TL;DR

This paper introduces a transformer-based deep learning framework for rhythm quantization of MIDI performances, utilizing beat annotations and data augmentation to achieve high accuracy and robustness.

Contribution

It presents a novel transformer architecture tailored for MIDI rhythm quantization, incorporating beat pre-quantization, a MIDI tokenizer, and augmentation techniques for improved performance.

Findings

01

Achieved 97.3% onset F1-score on the ASAP dataset.

02

Generalized well across unseen time signatures.

03

Fine-tuning improved performance on instrument-specific datasets.

Abstract

Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI performances, beat-based rhythm quantization remains largely unexplored. In this work, we introduce a novel deep learning approach for quantizing MIDI performances using a priori beat information. Our method leverages the transformer architecture to effectively process synchronized score and performance data for training a quantization model. Key components of our approach include dataset preparation, a beat-based pre-quantization method to align performance and score times within a unified framework, and a MIDI tokenizer tailored for this task. We adapt a transformer model based on the T5 architecture to meet the specific requirements of rhythm quantization. The model is evaluated using a set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.