Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Xiutian Zhao; Ismail Rasim Ulgen; Philipp Koehn; Bj\"orn Schuller; Berrak Sisman

arXiv:2603.17231·cs.CL·March 19, 2026

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Xiutian Zhao, Ismail Rasim Ulgen, Philipp Koehn, Bj\"orn Schuller, Berrak Sisman

PDF

Open Access

TL;DR

This paper introduces a neuron-level method for controlling emotion in speech-generative models, enabling precise, training-free emotion steering that generalizes across speakers and maintains content fidelity.

Contribution

It is the first to identify and utilize emotion-sensitive neurons in large audio-language models for causal, training-free emotion control during inference.

Findings

01

Emotion-sensitive neurons can be causally manipulated for emotion control.

02

Interventions improve emotion accuracy across unseen speakers.

03

Control depends on selector design and intervention parameters.

Abstract

Large audio-language models (LALMs) can produce expressive speech, yet reliable emotion control remains elusive: conversions often miss the target affect and may degrade linguistic fidelity through refusals, hallucinations, or paraphrase. We present, to our knowledge, the first neuron-level study of emotion control in speech-generative LALMs and demonstrate that compact emotion-sensitive neurons (ESNs) are causally actionable, enabling training-free emotion steering at inference time. ESNs are identified via success-filtered activation aggregation enforcing both emotion realization and content preservation. Across three LALMs (Qwen2.5-Omni-7B, MiniCPM-o 4.5, Kimi-Audio), ESN interventions yield emotion-specific gains that generalize to unseen speakers and are supported by automatic and human evaluation. Controllability depends on selector design, mask sparsity, filtering, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Neuroscience and Music Perception · Music and Audio Processing