Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription
Zhanhong He, Roberto Togneri, David Huang

TL;DR
This paper introduces a lightweight, modular Transformer-based correction module that refines MIDI velocity estimates in automatic music transcription, improving accuracy and generalization across datasets.
Contribution
It presents a novel score-informed Transformer module integrated into existing AMT systems, achieving state-of-the-art velocity estimation performance with minimal additional parameters.
Findings
Reduces velocity estimation errors on MAESTRO dataset
Improves cross-dataset generalization to SMD and MAPS
Outperforms existing methods with only 1 million additional parameters
Abstract
MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
