Filling MIDI Velocity using U-Net Image Colorizer

Zhanhong He; David Cooper; Defeng Huang; Roberto Togneri

arXiv:2508.07751·cs.SD·January 6, 2026

Filling MIDI Velocity using U-Net Image Colorizer

Zhanhong He, David Cooper, Defeng Huang, Roberto Togneri

PDF

Open Access

TL;DR

This paper applies a U-Net image colorization architecture to predict MIDI velocities, enhancing musical expressiveness by treating MIDI data as images and addressing data sparsity, with promising results on piano datasets.

Contribution

Introduces a novel approach using U-Net for MIDI velocity prediction by conceptualizing MIDI as images, incorporating window attention and a custom loss function.

Findings

01

Outperforms previous methods in quantitative metrics

02

Achieves better qualitative listening test results

03

Effective on piano datasets MAESTRO v3 and SMD

Abstract

Modern music producers commonly use MIDI (Musical Instrument Digital Interface) to store their musical compositions. However, MIDI files created with digital software may lack the expressive characteristics of human performances, essentially leaving the velocity parameter - a control for note loudness - undefined, which defaults to a flat value. The task of filling MIDI velocity is termed MIDI velocity prediction, which uses regression models to enhance music expressiveness by adjusting only this parameter. In this paper, we introduce the U-Net, a widely adopted architecture in image colorization, to this task. By conceptualizing MIDI data as images, we adopt window attention and develop a custom loss function to address the sparsity of MIDI-converted images. Current dataset availability restricts our experiments to piano data. Evaluated on the MAESTRO v3 and SMD datasets, our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception