Learning Interpretable Representation for Controllable Polyphonic Music Generation
Ziyu Wang, Dingsu Wang, Yixiao Zhang, Gus Xia

TL;DR
This paper introduces a VAE-based model that learns interpretable latent factors of polyphonic music, enabling controllable generation such as style transfer and texture variation.
Contribution
It proposes a novel architecture for disentangling chord and texture in polyphonic music within a VAE framework, enhancing controllability.
Findings
Successful disentanglement of chord and texture factors
High-quality controllable music generation demonstrated
Applications include style transfer and accompaniment arrangement
Abstract
While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
MethodsUSD Coin Customer Service Number +1-833-534-1729
