MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer
Gino Brunner, Andres Konrad, Yuyi Wang, Roger Wattenhofer

TL;DR
MIDI-VAE is a neural network model that captures polyphonic music dynamics and instrumentation, enabling style transfer, interpolation, and blending of complete musical pieces with smooth transitions.
Contribution
This work introduces MIDI-VAE, the first neural style transfer model capable of transforming entire musical compositions across styles while handling polyphony and dynamics.
Findings
Successfully performs style transfer between classical and jazz music.
Can interpolate and create medleys of musical pieces.
Achieves smooth harmonic transitions through learned representations.
Abstract
We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.g., a Classical to a Jazz style. We evaluate the efficacy of the style transfer by training separate style validation classifiers. Our model can also interpolate between short pieces of music, produce medleys and create mixtures of entire songs. The interpolations smoothly change pitches, dynamics and instrumentation to create a harmonic bridge between two music pieces. To the best of our knowledge, this work represents the first successful attempt at applying neural style transfer to complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
