ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy

Ya-Tse Wu; Chi-Chun Lee

arXiv:2601.20319·eess.AS·January 29, 2026

ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy

Ya-Tse Wu, Chi-Chun Lee

PDF

Open Access

TL;DR

This paper explores how emotion and speech synthesis strategies influence ASR accuracy, proposing targeted data augmentation methods that improve recognition of emotional speech without harming performance on neutral speech.

Contribution

It introduces two novel generative strategies for fine-tuning ASR models using emotion-aware synthetic speech, leading to improved performance on emotional datasets.

Findings

01

Consistent WER improvements on emotional speech datasets.

02

No degradation on clean LibriSpeech utterances.

03

Combined strategies yield the strongest gains for expressive speech.

Abstract

This work investigates how emotional speech and generative strategies affect ASR performance. We analyze speech synthesized from three emotional TTS models and find that substitution errors dominate, with emotional expressiveness varying across models. Based on these insights, we introduce two generative strategies: one using transcription correctness and another using emotional salience, to construct fine-tuning subsets. Results show consistent WER improvements on real emotional datasets without noticeable degradation on clean LibriSpeech utterances. The combined strategy achieves the strongest gains, particularly for expressive speech. These findings highlight the importance of targeted augmentation for building emotion-aware ASR systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Topic Modeling