FxSearcher: gradient-free text-driven audio transformation

Hojoon Ki; Jongsuk Kim; Minchan Kwon; Junmo Kim

arXiv:2511.14138·eess.AS·November 21, 2025

FxSearcher: gradient-free text-driven audio transformation

Hojoon Ki, Jongsuk Kim, Minchan Kwon, Junmo Kim

PDF

Open Access

TL;DR

FxSearcher is a gradient-free framework that uses Bayesian Optimization and CLAP to efficiently discover audio effect configurations for text-driven audio transformation, achieving results aligned with human preferences.

Contribution

It introduces a novel gradient-free approach combining Bayesian Optimization and CLAP for text-driven audio effects discovery, with an AI-based evaluation framework.

Findings

01

High alignment with human preferences in audio transformation quality

02

Effective discovery of audio effects configurations without gradients

03

Demonstrated superior performance over baseline methods

Abstract

Achieving diverse and high-quality audio transformations from text prompts remains challenging, as existing methods are fundamentally constrained by their reliance on a limited set of differentiable audio effects. This paper proposes FxSearcher, a novel gradient-free framework that discovers the optimal configuration of audio effects (FX) to transform a source signal according to a text prompt. Our method employs Bayesian Optimization and CLAP-based score function to perform this search efficiently. Furthermore, a guiding prompt is introduced to prevent undesirable artifacts and enhance human preference. To objectively evaluate our method, we propose an AI-based evaluation framework. The results demonstrate that the highest scores achieved by our method on these metrics align closely with human preferences. Demos are available at https://hojoonki.github.io/FxSearcher/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing