Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
Sebastian Murgul, Michael Heizmann

TL;DR
This paper introduces an end-to-end transformer model for beat and downbeat tracking in MIDI performances, outperforming existing methods and demonstrating the potential of transformers for symbolic music analysis.
Contribution
It presents a novel transformer-based approach with advanced data preprocessing for improved beat tracking accuracy in MIDI data.
Findings
Outperforms state-of-the-art symbolic beat tracking methods
Achieves high F1-scores across diverse datasets and musical styles
Demonstrates the effectiveness of transformer architectures for music rhythm analysis
Abstract
Beat tracking in musical performance MIDI is a challenging and important task for notation-level music transcription and rhythmical analysis, yet existing methods primarily focus on audio-based approaches. This paper proposes an end-to-end transformer-based model for beat and downbeat tracking in performance MIDI, leveraging an encoder-decoder architecture for sequence-to-sequence translation of MIDI input to beat annotations. Our approach introduces novel data preprocessing techniques, including dynamic augmentation and optimized tokenization strategies, to improve accuracy and generalizability across different datasets. We conduct extensive experiments using the A-MAPS, ASAP, GuitarSet, and Leduc datasets, comparing our model against state-of-the-art hidden Markov models (HMMs) and deep learning-based beat tracking methods. The results demonstrate that our model outperforms existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
