Combining audio control and style transfer using latent diffusion

Nils Demerl\'e; Philippe Esling; Guillaume Doras; David Genova

arXiv:2408.00196·cs.SD·August 2, 2024·2 cites

Combining audio control and style transfer using latent diffusion

Nils Demerl\'e, Philippe Esling, Guillaume Doras, David Genova

PDF

Open Access

TL;DR

This paper introduces a diffusion autoencoder-based model that unifies explicit control and style transfer for audio, enabling high-quality, controllable music synthesis and style transfer with improved fidelity.

Contribution

It proposes a novel approach that separates local and global musical features for combined control and style transfer using diffusion autoencoders and adversarial training.

Findings

01

Outperforms baselines in timbre transfer quality

02

Achieves high fidelity in MIDI-to-audio conversion

03

Generates cover versions with genre transfer

Abstract

Deep generative models are now able to synthesize high-quality audio signals, shifting the critical aspect in their development from audio quality to control capabilities. Although text-to-music generation is getting largely adopted by the general public, explicit control and example-based style transfer are more adequate modalities to capture the intents of artists and musicians. In this paper, we aim to unify explicit control and style transfer within a single model by separating local and global information to capture musical structure and timbre respectively. To do so, we leverage the capabilities of diffusion autoencoders to extract semantic features, in order to build two representation spaces. We enforce disentanglement between those spaces using an adversarial criterion and a two-stage training strategy. Our resulting model can generate audio matching a timbre target, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsDiffusion