Unsupervised Composable Representations for Audio
Giovanni Bindi, Philippe Esling

TL;DR
This paper introduces an unsupervised framework for compositional audio representations that leverages auto-encoding and diffusion models, enabling high-quality source separation and generation with lower computational costs.
Contribution
The paper presents a novel unsupervised approach using auto-encoding and diffusion models for compositional audio representations, improving source separation and generation performance.
Findings
Achieves comparable or better source separation than existing methods.
Surpasses supervised baselines on signal-to-interference ratio metrics.
Operates efficiently in the latent space of neural audio codecs.
Abstract
Current generative models are able to generate high-quality artefacts but have been shown to struggle with compositional reasoning, which can be defined as the ability to generate complex structures from simpler elements. In this paper, we focus on the problem of compositional representation learning for music data, specifically targeting the fully-unsupervised setting. We propose a simple and extensible framework that leverages an explicit compositional inductive bias, defined by a flexible auto-encoding objective that can leverage any of the current state-of-art generative models. We demonstrate that our framework, used with diffusion models, naturally addresses the task of unsupervised audio source separation, showing that our model is able to perform high-quality separation. Our findings reveal that our proposal achieves comparable or superior performance with respect to other blind…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsDiffusion · Focus
