Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models
Teysir Baoueb, Xiaoyu Bie, Xi Wang, Ga\"el Richard

TL;DR
This paper introduces Diff-TONE, a method for instrument editing in text-to-music diffusion models that optimizes the editing process by selecting an intermediate timestep, preserving content while changing instrument timbre without extra training.
Contribution
The paper proposes a novel timestep optimization technique for instrument editing in text-to-music diffusion models, enhancing control without additional training or speed loss.
Findings
Intermediate timestep selection improves instrument editing quality.
The method preserves original content while changing instrument timbre.
No additional training required, maintaining model speed.
Abstract
Breakthroughs in text-to-music generation models are transforming the creative landscape, equipping musicians with innovative tools for composition and experimentation like never before. However, controlling the generation process to achieve a specific desired outcome remains a significant challenge. Even a minor change in the text prompt, combined with the same random seed, can drastically alter the generated piece. In this paper, we explore the application of existing text-to-music diffusion models for instrument editing. Specifically, for an existing audio track, we aim to leverage a pretrained text-to-music diffusion model to edit the instrument while preserving the underlying content. Based on the insight that the model first focuses on the overall structure or content of the audio, then adds instrument information, and finally refines the quality, we show that selecting a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis
MethodsDiffusion
