Localizing and Editing Knowledge in Large Audio-Language Models

Sung Kyun Chung; Jiaheng Dong; Qiuchi Hu; Gongping Huang; Hong Jia; Ting Dang

arXiv:2603.14343·cs.LG·March 17, 2026

Localizing and Editing Knowledge in Large Audio-Language Models

Sung Kyun Chung, Jiaheng Dong, Qiuchi Hu, Gongping Huang, Hong Jia, Ting Dang

PDF

Open Access

TL;DR

This paper introduces a new framework for localizing and editing factual knowledge in large audio-language models, demonstrating that audio editing improves knowledge updates over text-only methods.

Contribution

It presents the first audio benchmark for knowledge editing in LALMs and a speech-driven locate-then-edit framework that enhances factual updates.

Findings

01

Audio and text modules jointly encode knowledge.

02

Audio editing outperforms text editing for knowledge updates.

03

Proposed method enables fine-grained control of factual knowledge.

Abstract

Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing