Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering

Neta Glazer; Lenny Aharon; Ethan Fetaya

arXiv:2603.06854·cs.SD·March 10, 2026

Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering

Neta Glazer, Lenny Aharon, Ethan Fetaya

PDF

Open Access

TL;DR

This paper identifies and enhances audio-specific attention mechanisms in large audio-language models to improve their ability to utilize audio evidence effectively, leading to better performance without retraining.

Contribution

It introduces a mechanistic interpretability approach to locate audio-specialist attention heads and applies an inference-time intervention to boost audio engagement.

Findings

01

Increased audio attention correlates with model output changes.

02

Intervention improves accuracy by up to +8.0 percentage points.

03

Method works on Qwen-based LALMs without retraining.

Abstract

Multimodal large language models can exhibit text dominance, over-relying on linguistic priors instead of grounding predictions in non-text inputs. One example is large audio-language models (LALMs) where decisive audio evidence can be under-utilized even when it contains important information. To address this issue we use mechanistic interpretability to identify a small set of audio-specialist attention heads whose audio attention yields a ``listening'' signal. We show that this signal increases when audio evidence affects the model's output, providing an indicator of audio engagement under standard prompting. Leveraging this localization, we construct an audio--silence steering direction and apply an inference-time activation intervention to the final representation, amplifying the model's audio effect. To demonstrate the utility of this intervention, we show on MMAU that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeuroscience and Music Perception · Music and Audio Processing · Speech and Audio Processing