AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework

Yuhang Jia; Yang Chen; Jinghua Zhao; Shiwan Zhao; Wenjia Zeng; Yong; Chen; Yong Qin

arXiv:2409.12466·cs.SD·October 1, 2024

AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework

Yuhang Jia, Yang Chen, Jinghua Zhao, Shiwan Zhao, Wenjia Zeng, Yong, Chen, Yong Qin

PDF

Open Access 1 Repo

TL;DR

AudioEditor is a novel training-free framework for precise and high-quality audio editing using pretrained diffusion models, addressing challenges of preserving unedited audio parts while executing accurate edits.

Contribution

It introduces a training-free audio editing method leveraging Null-text Inversion and EOT-suppression techniques on pretrained diffusion models, a novel approach in audio editing.

Findings

01

Effective preservation of original audio features during editing

02

High-quality audio edits validated through objective and subjective tests

03

Demonstrates the feasibility of training-free audio editing with diffusion models

Abstract

Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing remains equally important but has received comparatively little attention. Audio editing tasks face two primary challenges: executing precise edits and preserving the unedited sections. While workflows based on LDMs have effectively addressed these challenges in the field of image processing, similar approaches have been scarcely applied to audio editing. In this paper, we introduce AudioEditor, a training-free audio editing framework built on the pretrained diffusion-based TTA model. AudioEditor incorporates Null-text Inversion and EOT-suppression methods, enabling the model to preserve original audio features while executing accurate edits.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nku-hlt/audioeditor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies