Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation
Elona Shatri, George Fazekas

TL;DR
This paper applies instance segmentation with Mask R-CNN to improve the detection and recognition of musical symbols in Optical Music Recognition, significantly enhancing accuracy in dense scores and aiding knowledge discovery.
Contribution
It introduces the use of instance segmentation for OMR, demonstrating improved detection accuracy and providing publicly available tools for further research.
Findings
Achieved up to 59.70% mAP in dense symbol environments
Enhanced precision in dense musical scores
Provided publicly available implementation and models
Abstract
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI, significantly reducing the costs and time of manual transcription. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music. Unlike Optical Character Recognition (OCR), OMR must handle the intricate semantics of Common Western Music Notation (CWMN), where symbol meanings depend on shape, position, and context. Our approach leverages instance segmentation to manage the density and overlap of musical symbols, facilitating more precise information retrieval from music scores. Evaluations on the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with our method achieving a mean Average Precision (mAP) of up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
MethodsRegion Proposal Network · RoIAlign · Convolution · Softmax · Mask R-CNN · Multi-partition Embedding Interaction
