Filling MIDI Velocity using U-Net Image Colorizer
Zhanhong He, David Cooper, Defeng Huang, Roberto Togneri

TL;DR
This paper applies a U-Net image colorization architecture to predict MIDI velocities, enhancing musical expressiveness by treating MIDI data as images and addressing data sparsity, with promising results on piano datasets.
Contribution
Introduces a novel approach using U-Net for MIDI velocity prediction by conceptualizing MIDI as images, incorporating window attention and a custom loss function.
Findings
Outperforms previous methods in quantitative metrics
Achieves better qualitative listening test results
Effective on piano datasets MAESTRO v3 and SMD
Abstract
Modern music producers commonly use MIDI (Musical Instrument Digital Interface) to store their musical compositions. However, MIDI files created with digital software may lack the expressive characteristics of human performances, essentially leaving the velocity parameter - a control for note loudness - undefined, which defaults to a flat value. The task of filling MIDI velocity is termed MIDI velocity prediction, which uses regression models to enhance music expressiveness by adjusting only this parameter. In this paper, we introduce the U-Net, a widely adopted architecture in image colorization, to this task. By conceptualizing MIDI data as images, we adopt window attention and develop a custom loss function to address the sparsity of MIDI-converted images. Current dataset availability restricts our experiments to piano data. Evaluated on the MAESTRO v3 and SMD datasets, our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
