AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs
Townim Faisal Chowdhury, Ta Duc Huy, Siqi Pan, Jeremy Stoddard, Zhibin Liao

TL;DR
This paper introduces a novel interpretability framework for AudioLLMs using sparse autoencoders to disentangle complex activations, improving transparency and enabling better control over model behavior.
Contribution
It presents the first mechanistic interpretability approach for AudioLLMs, facilitating the identification and validation of meaningful audio concepts through automated and human evaluation methods.
Findings
AudioLLMs encode structured, interpretable features
The framework improves transparency and control
It lays groundwork for trustworthy deployment in high-stakes domains
Abstract
Despite strong performance in audio perception tasks, large audio-language models (AudioLLMs) remain opaque to interpretation. A major factor behind this lack of interpretability is that individual neurons in these models frequently activate in response to several unrelated concepts. We introduce the first mechanistic interpretability framework for AudioLLMs, leveraging sparse autoencoders (SAEs) to disentangle polysemantic activations into monosemantic features. Our pipeline identifies representative audio clips, assigns meaningful names via automated captioning, and validates concepts through human evaluation and steering. Experiments show that AudioLLMs encode structured and interpretable features, enhancing transparency and control. This work provides a foundation for trustworthy deployment in high-stakes domains and enables future extensions to larger models, multilingual audio,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Music and Audio Processing · Speech Recognition and Synthesis
