Listen to Your Favorite Melodies with img2Mxml, Producing MusicXML from Sheet Music Image by Measure-based Multimodal Deep Learning-driven Assembly
Tomoyuki Shishido, Fehmiju Fati, Daisuke Tokushige, and Yasuhiro Ono

TL;DR
This paper introduces img2Mxml, a deep learning-based system that converts sheet music images into MusicXML by accurately extracting and assembling measures, improving optical music recognition precision.
Contribution
The paper presents a novel measure-based multimodal deep learning assembly method for end-to-end optical music recognition from various sheet music images.
Findings
Effective measure extraction and alignment from diverse images
Accurate recognition of musical symbols including chords
Enhanced end-to-end OMR precision
Abstract
Deep learning has recently been applied to optical music recognition (OMR). However, currently OMR processing from various sheet music images still lacks precision to be widely applicable. Here, we present an MMdA (Measure-based Multimodal deep learning (DL)-driven Assembly) method allowing for end-to-end OMR processing from various images including inclined photo images. Using this method, measures are extracted by a deep learning model, aligned, and resized to be used for inference of given musical symbol components by using multiple deep learning models in sequence or in parallel. Use of each standardized measure enables efficient training of the models and accurate adjustment of five staff lines in each measure. Multiple musical symbol component category models with a small number of feature types can represent a diverse set of notes and other musical symbols including chords. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Handwritten Text Recognition Techniques
