TrOMR:Transformer-Based Polyphonic Optical Music Recognition
Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li

TL;DR
This paper introduces TrOMR, a transformer-based model for polyphonic optical music recognition that outperforms existing methods, especially in real-world scenarios, by leveraging global perceptual capabilities and novel training techniques.
Contribution
The paper presents a novel transformer-based approach for end-to-end polyphonic OMR, including a new loss function and data annotation method to enhance accuracy on complex scores.
Findings
TrOMR achieves superior accuracy compared to existing methods.
The model performs well on real-world full-page music scores.
The approach is validated through extensive experiments and a new dataset.
Abstract
Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
