From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR
Nan Xu, Shiheng Li, Shengchao Hou

TL;DR
This paper introduces a two-stage optical music recognition system focusing on complex polyphonic scores, utilizing topology recognition and a novel data strategy to improve score decoding accuracy.
Contribution
It presents a new structure decoding approach with topology recognition and a combined procedural and feedback data strategy for complex polyphonic OMR.
Findings
Effective decoding of complex polyphonic scores demonstrated.
Topology recognition improves structure decoding accuracy.
Data strategy enhances training with procedural and feedback annotations.
Abstract
We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
