Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription

Zhanhong He; Roberto Togneri; David Huang

arXiv:2508.07757·eess.AS·March 3, 2026

Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription

Zhanhong He, Roberto Togneri, David Huang

PDF

Open Access

TL;DR

This paper introduces a lightweight, modular Transformer-based correction module that refines MIDI velocity estimates in automatic music transcription, improving accuracy and generalization across datasets.

Contribution

It presents a novel score-informed Transformer module integrated into existing AMT systems, achieving state-of-the-art velocity estimation performance with minimal additional parameters.

Findings

01

Reduces velocity estimation errors on MAESTRO dataset

02

Improves cross-dataset generalization to SMD and MAPS

03

Outperforms existing methods with only 1 million additional parameters

Abstract

MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing