An Empirical Evaluation of End-to-End Polyphonic Optical Music Recognition
Sachinda Edirisooriya, Hao-Wen Dong, Julian McAuley, Taylor, Berg-Kirkpatrick

TL;DR
This paper introduces new datasets and models for end-to-end polyphonic optical music recognition, achieving state-of-the-art results by treating the task as multi-sequence detection with novel decoder architectures.
Contribution
It presents two innovative formulations for polyphonic OMR and introduces the RNNDecoder, improving recognition accuracy on complex polyphonic scores.
Findings
RNNDecoder achieves state-of-the-art performance.
New datasets enable large-scale polyphonic OMR evaluation.
Multi-sequence detection outperforms previous methods.
Abstract
Previous work has shown that neural architectures are able to perform optical music recognition (OMR) on monophonic and homophonic music with high accuracy. However, piano and orchestral scores frequently exhibit polyphonic passages, which add a second dimension to the task. Monophonic and homophonic music can be described as homorhythmic, or having a single musical rhythm. Polyphonic music, on the other hand, can be seen as having multiple rhythmic sequences, or voices, concurrently. We first introduce a workflow for creating large-scale polyphonic datasets suitable for end-to-end recognition from sheet music publicly available on the MuseScore forum. We then propose two novel formulations for end-to-end polyphonic OMR -- one treating the problem as a type of multi-task binary classification, and the other treating it as multi-sequence detection. Building upon the encoder-decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
