Timbre transfer using image-to-image denoising diffusion implicit models

Luca Comanducci; Fabio Antonacci; Augusto Sarti

arXiv:2307.04586·eess.AS·July 31, 2023·ISMIR·1 cites

Timbre transfer using image-to-image denoising diffusion implicit models

Luca Comanducci, Fabio Antonacci, Augusto Sarti

PDF

Open Access

TL;DR

This paper introduces a novel application of Denoising Diffusion Implicit Models (DDIMs) for timbre transfer in audio, converting instrument sounds while preserving musical content, and demonstrates its effectiveness through subjective and objective evaluations.

Contribution

The paper adapts DDIMs for audio timbre transfer by transforming audio into spectrograms and conditioning generation, enabling both one-to-one and many-to-many transfer tasks.

Findings

01

Outperforms existing methods in listening tests

02

Achieves comparable or better objective metrics

03

Enables efficient many-to-many timbre transfer

Abstract

Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Image and Signal Denoising Methods

MethodsDiffusion