Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Daniel Dratschuk; Paul Swoboda

arXiv:2605.10835·cs.CV·May 12, 2026

Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Daniel Dratschuk, Paul Swoboda

PDF

1 Models 2 Datasets

TL;DR

Transcoda introduces a data-centric synthetic training approach with normalization and grammar-based decoding, enabling efficient end-to-end zero-shot optical music recognition that outperforms larger models.

Contribution

It presents a novel synthetic data pipeline, normalization of music encoding, and grammar-based decoding to improve OMR accuracy without large datasets.

Findings

01

Outperforms state-of-the-art baselines on synthetic benchmark with 18.46% OMR-NED

02

Reduces error rate on historical Polish scans to 63.97% OMR-NED

03

Trains a 59M-parameter model in 6 hours on a single GPU

Abstract

Optical Music Recognition (OMR), the task of transcribing sheet music into a structured textual representation, is currently bottlenecked by a lack of large-scale, annotated datasets of real scans. This forces models to rely on either few-shot transfer or synthetic training pipelines that remain overly simplistic. A secondary challenge is encoding non-uniqueness: in the popular Humdrum **kern format for transcribing music, multiple different text encodings can render into the same visual sheet music. This one-to-many mapping creates a harder learning task and introduces high uncertainty during decoding. We propose Transcoda, an OMR system built on (i) an advanced synthetic data generation pipeline, (ii) a normalization of the **kern encoding to enforce a unique normal form and (iii) grammar-based decoding to ensure the syntactic correctness of the output. This approach allows us to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
btrkeks/transcoda-59M-zeroshot-v1
model· 104 dl
104 dl

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.