A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

Pau Torras; Ji\v{r}\'i Mayer; Carles Badal; Martina Dvo\v{r}\'akov\'a; Mark\'eta Herzanov\'a Vlkov\'a; Gerard Asbert; Vojt\v{e}ch Dvo\v{r}\'ak; Samuel \v{S}omorjai; Jan Haji\v{c} jr.; Alicia Forn\'es

arXiv:2605.18436·cs.CV·May 19, 2026

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

Pau Torras, Ji\v{r}\'i Mayer, Carles Badal, Martina Dvo\v{r}\'akov\'a, Mark\'eta Herzanov\'a Vlkov\'a, Gerard Asbert, Vojt\v{e}ch Dvo\v{r}\'ak, Samuel \v{S}omorjai, Jan Haji\v{c} jr., Alicia Forn\'es

PDF

TL;DR

The paper introduces MusiCorpus, the largest and most realistic dataset of handwritten historical music scores with annotations, to advance optical music recognition systems.

Contribution

It provides a comprehensive dataset of handwritten historical music scores with transcriptions and annotations, enabling improved training and evaluation of OMR systems.

Findings

01

First dataset with realistic handwritten music scores from memory institutions.

02

Enables training of end-to-end and object detection-based OMR systems.

03

Facilitates performance comparison of different OMR approaches.

Abstract

A large amount of musical heritage has been digitised by memory institutions: libraries, museums, and archives. Nevertheless, the field of Optical Music Recognition (OMR) has struggled with making this music machine-readable, despite advances in deep learning, mostly because no datasets for training systems in realistic conditions were available. The MusiCorpus dataset aims to remedy this situation by providing 1,309 pages of historical sheet music, primarily handwritten, with MusicXML transcriptions and symbol annotations. It is the largest dataset of handwritten music to date and the first dataset containing a realistic and representative sample of musical document collections from memory institutions, suitable for training and evaluating both end-to-end and object detection-based OMR systems and comparing their performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.