Evaluating Disentangled Representations for Controllable Music Generation
Laura Ib\'a\~nez-Mart\'inez, Chukwuemeka Nkama, Andrea Poltronieri, Xavier Serra, Mart\'in Rocamora

TL;DR
This paper critically evaluates the effectiveness of disentangled representations in controllable music generation, revealing inconsistencies and limitations in current strategies through a comprehensive probing framework.
Contribution
It introduces a probing-based framework to assess disentangled music representations, analyzing diverse strategies and revealing their shortcomings in achieving true disentanglement.
Findings
Current strategies often produce embeddings with inconsistent semantics.
Disentanglement strategies fall short of achieving true disentanglement.
The analysis highlights the need for re-examining controllability approaches in music generation.
Abstract
Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse unsupervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
