Audio Texture Manipulation by Exemplar-Based Analogy

Kan Jen Cheng; Tingle Li; Gopala Anumanchipalli

arXiv:2501.12385·cs.SD·January 22, 2025

Audio Texture Manipulation by Exemplar-Based Analogy

Kan Jen Cheng, Tingle Li, Gopala Anumanchipalli

PDF

Open Access

TL;DR

This paper introduces an exemplar-based analogy model for audio texture manipulation that uses paired speech examples to learn transformations, outperforming text-conditioned methods and generalizing well across diverse scenarios.

Contribution

The paper presents a novel self-supervised latent diffusion model for audio texture manipulation using paired examples, avoiding text-based conditioning and enhancing generalization.

Findings

01

Outperforms text-conditioned baselines in evaluations

02

Generalizes to out-of-distribution and non-speech sounds

03

Effective in real-world audio texture editing tasks

Abstract

Audio texture manipulation involves modifying the perceptual characteristics of a sound to achieve specific transformations, such as adding, removing, or replacing auditory elements. In this paper, we propose an exemplar-based analogy model for audio texture manipulation. Instead of conditioning on text-based instructions, our method uses paired speech examples, where one clip represents the original sound and another illustrates the desired transformation. The model learns to apply the same transformation to new input, allowing for the manipulation of sound textures. We construct a quadruplet dataset representing various editing tasks, and train a latent diffusion model in a self-supervised manner. We show through quantitative evaluations and perceptual studies that our model outperforms text-conditioned baselines and generalizes to real-world, out-of-distribution, and non-speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing

MethodsDiffusion · Contrastive Language-Image Pre-training · Latent Diffusion Model