Precise and Simple Audio-to-Score Alignment
Silvan Peter, Patricia Hu, Gerhard Widmer

TL;DR
This paper introduces a novel, precise, and flexible audio-to-score alignment algorithm that directly matches audio features to symbolic scores using dynamic programming, outperforming existing methods.
Contribution
The authors present a new alignment algorithm that bridges audio-like features and symbolic scores directly, avoiding the need for transcription or synthesis, with improved accuracy and efficiency.
Findings
Surpasses widely used audio-to-audio alignment methods in precision.
Maintains linear complexity in score and audio sequence lengths.
Demonstrates effectiveness on a large-scale solo piano dataset.
Abstract
Audio-to-score alignment is a long-standing challenge in music information retrieval and arguably the most widely applicable alignment task for music research. Alignment algorithms match two versions of a piece of music, and for this to work these versions need to be in comparable formats. Audio-to-audio alignment matches audio features; when matching audio files to scores, they must either synthesize the score or derive audio-like features by means of piano rolls or similar feature sequences. Symbolic alignment, by contrast, matches symbolically encoded notes; in an audio-to-score scenario these would be obtained by a transcription of the audio file. In this article, we present an algorithm that bridges audio-like and symbol-level features directly. Sequential audio features encoding onset and spectral activation are matched to score positions by a bespoke dynamic programming-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
