AudioGenX: Explainability on Text-to-Audio Generative Models

Hyunju Kang; Geonhee Han; Yoonjae Jeong; Hogun Park

arXiv:2502.00459·cs.SD·October 20, 2025

AudioGenX: Explainability on Text-to-Audio Generative Models

Hyunju Kang, Geonhee Han, Yoonjae Jeong, Hogun Park

PDF

Open Access 1 Repo 1 Video

TL;DR

AudioGenX is a novel explainability method for text-to-audio models that highlights input token importance, improving transparency and trustworthiness of audio generation from text descriptions.

Contribution

We introduce AudioGenX, an explainability technique that uses factual and counterfactual objectives to provide faithful, token-level explanations for text-to-audio models.

Findings

01

AudioGenX produces more faithful explanations than existing methods.

02

The method enhances understanding of text-to-audio relationships.

03

Experimental results validate the effectiveness of AudioGenX with new evaluation metrics.

Abstract

Text-to-audio generation models (TAG) have achieved significant advances in generating audio conditioned on text descriptions. However, a critical challenge lies in the lack of transparency regarding how each textual input impacts the generated audio. To address this issue, we introduce AudioGenX, an Explainable AI (XAI) method that provides explanations for text-to-audio generation models by highlighting the importance of input tokens. AudioGenX optimizes an Explainer by leveraging factual and counterfactual objective functions to provide faithful explanations at the audio token level. This method offers a detailed and comprehensive understanding of the relationship between text inputs and audio outputs, enhancing both the explainability and trustworthiness of TAG models. Extensive experiments demonstrate the effectiveness of AudioGenX in producing faithful explanations, benchmarked…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hjkng/audiogenX
pytorchOfficial

Videos

AudioGenX: Explainability on Text-to-Audio Generative Models· underline

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies