EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Li Zhou; Hao Jiang; Junjie Li; Tianrui Wang; Haizhou Li

arXiv:2601.22873·eess.AS·February 2, 2026

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Li Zhou, Hao Jiang, Junjie Li, Tianrui Wang, Haizhou Li

PDF

Open Access

TL;DR

EmoShift introduces a lightweight activation-steering method for emotion-aware speech synthesis, enabling precise emotional control with minimal parameters and outperforming traditional fine-tuning approaches.

Contribution

The paper proposes EmoShift, a novel low-parameter framework with an EmoSteer layer that effectively models emotion-specific latent features in speech synthesis.

Findings

01

Outperforms zero-shot and fully fine-tuned baselines in evaluations

02

Uses less than 1/30 of the parameters compared to full fine-tuning

03

Enhances emotional expressiveness while maintaining naturalness and speaker similarity

Abstract

Achieving precise and controllable emotional expression is crucial for producing natural and context-appropriate speech in text-to-speech (TTS) synthesis. However, many emotion-aware TTS systems, including large language model (LLM)-based designs, rely on scaling fixed emotion embeddings or external guidance, limiting their ability to model emotion-specific latent characteristics. To address this gap, we present EmoShift, a lightweight activation-steering framework incorporating a EmoSteer layer, which learns a steering vector for each target emotion in the output embedding space to capture its latent offset and maintain stable, appropriate expression across utterances and categories. With only 10M trainable parameters,less than 1/30 of full fine-tuning, EmoShift outperforms zero-shot and fully fine-tuned baselines in objective and subjective evaluations, enhancing emotional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Mental Health via Writing · Emotion and Mood Recognition