TrOMR:Transformer-Based Polyphonic Optical Music Recognition

Yixuan Li; Huaping Liu; Qiang Jin; Miaomiao Cai; Peng Li

arXiv:2308.09370·cs.CL·August 21, 2023

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces TrOMR, a transformer-based model for polyphonic optical music recognition that outperforms existing methods, especially in real-world scenarios, by leveraging global perceptual capabilities and novel training techniques.

Contribution

The paper presents a novel transformer-based approach for end-to-end polyphonic OMR, including a new loss function and data annotation method to enhance accuracy on complex scores.

Findings

01

TrOMR achieves superior accuracy compared to existing methods.

02

The model performs well on real-world full-page music scores.

03

The approach is validated through extensive experiments and a new dataset.

Abstract

Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

netease/polyphonic-tromr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies