TheGlueNote: Learned Representations for Robust and Flexible Note Alignment
Silvan David Peter, Gerhard Widmer

TL;DR
TheGlueNote introduces a transformer-based model that learns note representations to improve robustness and flexibility in aligning musical notes across different versions, handling complex mismatches better than traditional methods.
Contribution
It presents a novel transformer encoder approach for note alignment that is more robust to mismatches and works directly on MIDI files, outperforming traditional sequence alignment methods.
Findings
Performs on par with state-of-the-art in accuracy
More robust to version mismatches
Works directly on MIDI files
Abstract
Note alignment refers to the task of matching individual notes of two versions of the same symbolically encoded piece. Methods addressing this task commonly rely on sequence alignment algorithms such as Hidden Markov Models or Dynamic Time Warping (DTW) applied directly to note or onset sequences. While successful in many cases, such methods struggle with large mismatches between the versions. In this work, we learn note-wise representations from data augmented with various complex mismatch cases, e.g. repeats, skips, block insertions, and long trills. At the heart of our approach lies a transformer encoder network - TheGlueNote - which predicts pairwise note similarities for two 512 note subsequences. We postprocess the predicted similarities using flavors of weightedDTW and pitch-separated onsetDTW to retrieve note matches for two sequences of arbitrary length. Our approach performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Hand Gesture Recognition Systems
