Audio Texture Manipulation by Exemplar-Based Analogy
Kan Jen Cheng, Tingle Li, Gopala Anumanchipalli

TL;DR
This paper introduces an exemplar-based analogy model for audio texture manipulation that uses paired speech examples to learn transformations, outperforming text-conditioned methods and generalizing well across diverse scenarios.
Contribution
The paper presents a novel self-supervised latent diffusion model for audio texture manipulation using paired examples, avoiding text-based conditioning and enhancing generalization.
Findings
Outperforms text-conditioned baselines in evaluations
Generalizes to out-of-distribution and non-speech sounds
Effective in real-world audio texture editing tasks
Abstract
Audio texture manipulation involves modifying the perceptual characteristics of a sound to achieve specific transformations, such as adding, removing, or replacing auditory elements. In this paper, we propose an exemplar-based analogy model for audio texture manipulation. Instead of conditioning on text-based instructions, our method uses paired speech examples, where one clip represents the original sound and another illustrates the desired transformation. The model learns to apply the same transformation to new input, allowing for the manipulation of sound textures. We construct a quadruplet dataset representing various editing tasks, and train a latent diffusion model in a self-supervised manner. We show through quantitative evaluations and perceptual studies that our model outperforms text-conditioned baselines and generalizes to real-world, out-of-distribution, and non-speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing
MethodsDiffusion · Contrastive Language-Image Pre-training · Latent Diffusion Model
