Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Hila Manor; Tomer Michaeli

arXiv:2402.10009·cs.SD·May 30, 2024·1 cites

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Hila Manor, Tomer Michaeli

PDF

Open Access 1 Repo

TL;DR

This paper introduces two zero-shot audio editing methods using DDPM inversion with pre-trained diffusion models, enabling semantic and text-based modifications of music signals without supervision.

Contribution

It presents the first zero-shot audio editing techniques leveraging DDPM inversion, including a novel unsupervised method for discovering meaningful editing directions.

Findings

01

Enables semantic audio editing without training data

02

Demonstrates control over instruments and melody in music signals

03

Provides open-source samples and code

Abstract

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinhualiang/tage
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion