A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions
Junwen Ma, Huhu Xue, Xingyuan Zhao, and Weicheng Fu

TL;DR
This paper introduces an end-to-end optical music recognition framework combining residual bottleneck convolutions with BiGRU sequence modeling, achieving high accuracy and efficiency on standard datasets.
Contribution
The novel integration of residual bottleneck CNNs with BiGRU for OMR improves accuracy and computational efficiency over existing methods.
Findings
Achieves SeER of 7.52% on Camera-PrIMuS dataset.
Attains symbol error rate of 0.45% with high pitch and note accuracy.
Demonstrates fast training time of 1.74 seconds per epoch.
Abstract
Optical Music Recognition (OMR) aims to convert printed or handwritten music score images into editable symbolic representations. This paper presents an end-to-end OMR framework that combines residual bottleneck convolutions with bidirectional gated recurrent unit (BiGRU)-based sequence modeling. A convolutional neural network with ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions is used to extract features that encode both fine-grained symbol details and global staff-line structures. The extracted feature sequences are then fed into a BiGRU network to model temporal dependencies among musical symbols. The model is trained using the Connectionist Temporal Classification loss, enabling end-to-end prediction without explicit alignment annotations. Experimental results on the Camera-PrIMuS and PrIMuS datasets demonstrate the effectiveness of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
