Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning
Seungyeon Rhyu, Sarah Kim, Kyogu Lee

TL;DR
This paper introduces a self-supervised, hierarchical variational autoencoder system for flexible, expressive piano performance rendering that disentangles musical attributes and allows independent control of expression and structure.
Contribution
It presents a novel self-supervised, hierarchical VAE framework that disentangles and independently controls musical expression and structure in piano performance rendering.
Findings
Stable generation of performance parameters from scores
Disentangled representations of musical attributes
Independent control over musical expression
Abstract
We propose a system for rendering a symbolic piano performance with flexible musical expression. It is necessary to actively control musical expression for creating a new music performance that conveys various emotions or nuances. However, previous approaches were limited to following the composer's guidelines of musical expression or dealing with only a part of the musical attributes. We aim to disentangle the entire musical expression and structural attribute of piano performance using a conditional VAE framework. It stochastically generates expressive parameters from latent representations and given note structures. In addition, we employ self-supervised approaches that force the latent variables to represent target attributes. Finally, we leverage a two-step encoder and decoder that learn hierarchical dependency to enhance the naturalness of the output. Experimental results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
