Predicting performance difficulty from piano sheet music images
Pedro Ramoneda, Jose J. Valero-Mas, Dasaem Jeong, Xavier Serra

TL;DR
This paper introduces a transformer-based method using mid-level representations to predict performance difficulty from sheet music images, achieving promising results across multiple datasets.
Contribution
It presents a novel approach combining bootleg score representations with a sequence encoding scheme and transformer models for difficulty prediction from sheet music images.
Findings
Achieved a balanced accuracy of 40.34%
Attained a mean square error of 1.33
Validated approach on five datasets with over 7500 scores
Abstract
Estimating the performance difficulty of a musical score is crucial in music education for adequately designing the learning curriculum of the students. Although the Music Information Retrieval community has recently shown interest in this task, existing approaches mainly use machine-readable scores, leaving the broader case of sheet music images unaddressed. Based on previous works involving sheet music images, we use a mid-level representation, bootleg score, describing notehead positions relative to staff lines coupled with a transformer model. This architecture is adapted to our task by introducing an encoding scheme that reduces the encoded sequence length to one-eighth of the original size. In terms of evaluation, we consider five datasets -- more than 7500 scores with up to 9 difficulty levels -- , two of them particularly compiled for this work. The results obtained when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Education and Analysis
