Just Label the Repeats for In-The-Wild Audio-to-Score Alignment
Irmak Bukey, Michael Feffer, Chris Donahue

TL;DR
This paper introduces a semi-automatic workflow for high-quality in-the-wild audio-to-score alignment, combining minimal human annotation with improved feature representations, significantly outperforming prior methods.
Contribution
It presents a new annotation workflow and refined features that together enhance alignment accuracy for in-the-wild audio and sheet music scans.
Findings
Alignment accuracy improved by 150% (33% to 82%)
Human supervision reduces errors compared to fully automatic methods
Refined features with measure detection and raw onset probabilities enhance performance
Abstract
We propose an efficient workflow for high-quality offline alignment of in-the-wild performance audio and corresponding sheet music scans (images). Recent work on audio-to-score alignment extends dynamic time warping (DTW) to be theoretically able to handle jumps in sheet music induced by repeat signs-this method requires no human annotations, but we show that it often yields low-quality alignments. As an alternative, we propose a workflow and interface that allows users to quickly annotate jumps (by clicking on repeat signs), requiring a small amount of human supervision but yielding much higher quality alignments on average. Additionally, we refine audio and score feature representations to improve alignment quality by: (1) integrating measure detection into the score feature representation, and (2) using raw onset prediction probabilities from a music transcription model instead of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
