Toward a More Complete OMR Solution
Guang Yang (1), Muru Zhang (1), Lin Qiu (1), Yanming Wan (1), Noah A., Smith (1, 2) ((1) Paul G. Allen School of Computer Science & Engineering,, University of Washington, United States, (2) Allen Institute for Artificial, Intelligence, United States)

TL;DR
This paper advances optical music recognition by integrating detection and assembly stages, using a YOLOv8-based detector and a supervised assembly pipeline, demonstrating improved performance on the MUSCIMA++ v2.0 dataset.
Contribution
It introduces a YOLOv8-based music object detector and a supervised notation assembly pipeline that jointly optimize detection and assembly stages in OMR.
Findings
The combined model outperforms existing models trained on perfect detection.
The holistic approach improves overall OMR accuracy.
A novel evaluation metric is proposed for better assessment.
Abstract
Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we focus on the MUSCIMA++ v2.0 dataset, which represents musical notation as a graph with pairwise relationships among detected music objects, and we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output. We find that this model is able to outperform existing models trained on perfect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Handwritten Text Recognition Techniques · Music Technology and Sound Studies
MethodsFocus · You Only Look Once
