Is Disentanglement enough? On Latent Representations for Controllable   Music Generation

Ashis Pati; Alexander Lerch

arXiv:2108.01450·cs.SD·August 4, 2021·6 cites

Is Disentanglement enough? On Latent Representations for Controllable Music Generation

Ashis Pati, Alexander Lerch

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether disentangled latent representations in VAEs are sufficient for controllable music generation, highlighting the importance of decoder structure and proposing evaluation metrics.

Contribution

The study systematically analyzes the link between disentanglement and controllability in VAEs, emphasizing the decoder's role and introducing new evaluation methods.

Findings

01

High disentanglement does not guarantee controllability without a strong decoder.

02

Latent space structure significantly influences attribute manipulation.

03

Proposed metrics effectively evaluate controllability in latent spaces.

Abstract

Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between disentanglement and controllability by conducting a systematic study using different supervised disentanglement learning algorithms based on the Variational Auto-Encoder (VAE) architecture. Our experiments show that a high degree of disentanglement can be achieved by using different forms of supervision to train a strong discriminative encoder. However, in the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ashispati/dmelodies_controllability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis