MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models
Purnima Kamath, Chitralekha Gupta, Suranga Nanayakkara

TL;DR
MorphFader introduces a novel technique to achieve fine-grained control over sound morphing in text-to-audio models by interpolating cross-attention components, enabling smooth and perceptually meaningful sound transformations.
Contribution
It presents a new method for sound morphing that leverages cross-attention interpolation in diffusion models, allowing detailed semantic control during sound transformation.
Findings
Effective interpolation of cross-attention layers produces smooth sound morphs.
Objective metrics and listening tests confirm granular semantic control.
Method outperforms baseline approaches in sound quality and control.
Abstract
Sound morphing is the process of gradually and smoothly transforming one sound into another to generate novel and perceptually hybrid sounds that simultaneously resemble both. Recently, diffusion-based text-to-audio models have produced high-quality sounds using text prompts. However, granularly controlling the semantics of the sound, which is necessary for morphing, can be challenging using text. In this paper, we propose \textit{MorphFader}, a controllable method for morphing sounds generated by disparate prompts using text-to-audio models. By intercepting and interpolating the components of the cross-attention layers within the diffusion process, we can create smooth morphs between sounds generated by different text prompts. Using both objective metrics and perceptual listening tests, we demonstrate the ability of our method to granularly control the semantics in the sound and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Human Motion and Animation
MethodsDiffusion
