Localizing and Editing Knowledge in Large Audio-Language Models
Sung Kyun Chung, Jiaheng Dong, Qiuchi Hu, Gongping Huang, Hong Jia, Ting Dang

TL;DR
This paper introduces a new framework for localizing and editing factual knowledge in large audio-language models, demonstrating that audio editing improves knowledge updates over text-only methods.
Contribution
It presents the first audio benchmark for knowledge editing in LALMs and a speech-driven locate-then-edit framework that enhances factual updates.
Findings
Audio and text modules jointly encode knowledge.
Audio editing outperforms text editing for knowledge updates.
Proposed method enables fine-grained control of factual knowledge.
Abstract
Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
