Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Eelco van der Wel; Karen Ullrich

arXiv:1707.04877·cs.CV·July 18, 2017·46 cites

Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Eelco van der Wel, Karen Ullrich

PDF

Open Access 3 Repos

TL;DR

This paper introduces a convolutional sequence-to-sequence deep learning model for optical music recognition, trained on a novel large dataset, achieving high accuracy and outperforming commercial methods.

Contribution

It presents a new end-to-end trainable OMR model that learns from full sheet music sentences, utilizing a large publicly available dataset and data augmentation techniques.

Findings

01

Pitch recognition accuracy of 81%

02

Duration accuracy of 94%

03

Note-level accuracy of 80%

Abstract

Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-end trainable OMR pipeline, and apply a learning process that trains on full sentences of sheet music instead of individually labeled symbols. The model is trained and evaluated on a human generated data set, with various image augmentations based on real-world scenarios. This data set is the first publicly available set in OMR research with sufficient size to train and evaluate deep learning models. With the introduced augmentations a pitch recognition accuracy of 81% and a duration accuracy of 94%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing