Timbre transfer using image-to-image denoising diffusion implicit models
Luca Comanducci, Fabio Antonacci, Augusto Sarti

TL;DR
This paper introduces a novel application of Denoising Diffusion Implicit Models (DDIMs) for timbre transfer in audio, converting instrument sounds while preserving musical content, and demonstrates its effectiveness through subjective and objective evaluations.
Contribution
The paper adapts DDIMs for audio timbre transfer by transforming audio into spectrograms and conditioning generation, enabling both one-to-one and many-to-many transfer tasks.
Findings
Outperforms existing methods in listening tests
Achieves comparable or better objective metrics
Enables efficient many-to-many timbre transfer
Abstract
Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Image and Signal Denoising Methods
MethodsDiffusion
