AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
Yuhang Jia, Yang Chen, Jinghua Zhao, Shiwan Zhao, Wenjia Zeng, Yong, Chen, Yong Qin

TL;DR
AudioEditor is a novel training-free framework for precise and high-quality audio editing using pretrained diffusion models, addressing challenges of preserving unedited audio parts while executing accurate edits.
Contribution
It introduces a training-free audio editing method leveraging Null-text Inversion and EOT-suppression techniques on pretrained diffusion models, a novel approach in audio editing.
Findings
Effective preservation of original audio features during editing
High-quality audio edits validated through objective and subjective tests
Demonstrates the feasibility of training-free audio editing with diffusion models
Abstract
Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing remains equally important but has received comparatively little attention. Audio editing tasks face two primary challenges: executing precise edits and preserving the unedited sections. While workflows based on LDMs have effectively addressed these challenges in the field of image processing, similar approaches have been scarcely applied to audio editing. In this paper, we introduce AudioEditor, a training-free audio editing framework built on the pretrained diffusion-based TTA model. AudioEditor incorporates Null-text Inversion and EOT-suppression methods, enabling the model to preserve original audio features while executing accurate edits.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
