Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
Xiutian Zhao, Bj\"orn Schuller, Berrak Sisman

TL;DR
This study identifies and validates emotion-sensitive neurons in large audio-language models, demonstrating their causal role in emotion recognition and enabling controllable affective responses through targeted interventions.
Contribution
It provides the first neuron-level interpretability and causal validation of emotion-sensitive neurons in large audio-language models, advancing understanding of internal emotion encoding.
Findings
Emotion-sensitive neurons exist in multiple open-source LALMs.
Ablation of these neurons impairs emotion recognition.
Gain-based interventions can steer model predictions toward specific emotions.
Abstract
Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in LALMs and provide causal evidence that such units exist in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3. Across these three widely used open-source models, we compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks. Using inference-time interventions, we reveal a consistent emotion-specific signature: ablating neurons selected for a given emotion disproportionately degrades recognition of that emotion while largely preserving other classes, whereas gain-based amplification steers predictions toward the target emotion. These effects arise with modest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Neuroscience and Music Perception · Music and Audio Processing
