Precise and Simple Audio-to-Score Alignment

Silvan Peter; Patricia Hu; Gerhard Widmer

arXiv:2605.20014·cs.SD·May 20, 2026

Precise and Simple Audio-to-Score Alignment

Silvan Peter, Patricia Hu, Gerhard Widmer

PDF

TL;DR

This paper introduces a novel, precise, and flexible audio-to-score alignment algorithm that directly matches audio features to symbolic scores using dynamic programming, outperforming existing methods.

Contribution

The authors present a new alignment algorithm that bridges audio-like features and symbolic scores directly, avoiding the need for transcription or synthesis, with improved accuracy and efficiency.

Findings

01

Surpasses widely used audio-to-audio alignment methods in precision.

02

Maintains linear complexity in score and audio sequence lengths.

03

Demonstrates effectiveness on a large-scale solo piano dataset.

Abstract

Audio-to-score alignment is a long-standing challenge in music information retrieval and arguably the most widely applicable alignment task for music research. Alignment algorithms match two versions of a piece of music, and for this to work these versions need to be in comparable formats. Audio-to-audio alignment matches audio features; when matching audio files to scores, they must either synthesize the score or derive audio-like features by means of piano rolls or similar feature sequences. Symbolic alignment, by contrast, matches symbolically encoded notes; in an audio-to-score scenario these would be obtained by a transcription of the audio file. In this article, we present an algorithm that bridges audio-like and symbol-level features directly. Sequential audio features encoding onset and spectral activation are matched to score positions by a bespoke dynamic programming-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.