Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations
Maximilian Wachter, Sebastian Murgul, Michael Heizmann

TL;DR
This paper introduces a transformer-based deep learning framework for rhythm quantization of MIDI performances, utilizing beat annotations and data augmentation to achieve high accuracy and robustness.
Contribution
It presents a novel transformer architecture tailored for MIDI rhythm quantization, incorporating beat pre-quantization, a MIDI tokenizer, and augmentation techniques for improved performance.
Findings
Achieved 97.3% onset F1-score on the ASAP dataset.
Generalized well across unseen time signatures.
Fine-tuning improved performance on instrument-specific datasets.
Abstract
Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI performances, beat-based rhythm quantization remains largely unexplored. In this work, we introduce a novel deep learning approach for quantizing MIDI performances using a priori beat information. Our method leverages the transformer architecture to effectively process synchronized score and performance data for training a quantization model. Key components of our approach include dataset preparation, a beat-based pre-quantization method to align performance and score times within a unified framework, and a MIDI tokenizer tailored for this task. We adapt a transformer model based on the T5 architecture to meet the specific requirements of rhythm quantization. The model is evaluated using a set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
