Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects
Annie Chu, Patrick O'Reilly, Julia Barnett, Bryan Pardo

TL;DR
Text2FX introduces a novel method that uses CLAP embeddings and differentiable signal processing to control audio effects via natural language prompts without retraining models, enabling flexible sound transformations.
Contribution
It leverages CLAP embeddings for text-guided audio effects control and proposes two optimization methods, expanding the use of shared text-audio embedding spaces for audio manipulation.
Findings
CLAP encodes useful information for audio effects control
Two optimization approaches effectively map text to effect parameters
Listener studies show good alignment with human perception
Abstract
This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., "make this sound in-your-face and bold"). Text2FX operates without retraining any models, relying instead on single-instance optimization within the existing embedding space, thus enabling a flexible, scalable approach to open-vocabulary sound transformations through interpretable and disentangled FX manipulation. We show that CLAP encodes valuable information for controlling audio effects and propose two optimization approaches using CLAP to map text to audio effect parameters. While we demonstrate with CLAP, this approach is applicable to any shared text-audio embedding space. Similarly, while we demonstrate with equalization and reverberation, any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
