Towards Musically Informed Evaluation of Piano Transcription Models
Patricia Hu, Luk\'a\v{s} Samuel Mart\'ak, Carlos Cancino-Chac\'on,, Gerhard Widmer

TL;DR
This paper introduces musically informed evaluation metrics for piano transcription models, providing deeper insights into musical quality aspects and analyzing model performance on real-world versus perturbed audio data.
Contribution
It proposes new evaluation metrics that assess musical qualities beyond traditional IR metrics and compares model robustness across different audio conditions.
Findings
Existing IR metrics lack musical insight
New metrics reveal weaknesses in current models
Models perform worse on real-world and perturbed data
Abstract
Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent years, MAESTRO has become the de-facto training and evaluation dataset for such models. However, inference performance has been observed to deteriorate substantially when applied on out-of-distribution data, thereby questioning the suitability and reliability of transcribed outputs from such models for specific MIR tasks. In this work, we investigate the performance of three state-of-the-art piano transcription models in two experiments. In the first one, we propose a variety of musically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
