Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

Ching Ho Lee; Javier Nistal; Stefan Lattner; Marco Pasini; George Fazekas

arXiv:2601.01294·cs.SD·January 29, 2026

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas

PDF

Open Access

TL;DR

This paper presents a novel inference-time method for timbre transfer in music audio using a pre-trained latent diffusion model, involving noise injection and structural clamping to control instrument style while preserving musical structure.

Contribution

It introduces a lightweight, training-free approach for timbre transfer that leverages mutual information-guided inpainting on audio latents, compatible with text/audio conditioning.

Findings

01

Effective timbre transfer with structural preservation

02

Inference-time controls enable style steering

03

Compatible with text/audio conditioning models

Abstract

We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic and rhythmic structure during reverse diffusion. The method operates directly on audio latents and is compatible with text/audio conditioning (e.g., CLAP). We discuss design choices,analyze trade-offs between timbral change and structural preservation, and show that simple inference-time controls can meaningfully steer pre-trained models for style-transfer use cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis