EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis

Haoxun Li; Leyuan Qu; Jiaxi Hu; Taihao Li

arXiv:2507.12015·cs.SD·July 17, 2025

EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis

Haoxun Li, Leyuan Qu, Jiaxi Hu, Taihao Li

PDF

Open Access

TL;DR

EME-TTS introduces a novel framework that effectively integrates emphasis and emotion in speech synthesis, enhancing expressiveness and stability across emotions using weakly supervised learning and a perception enhancement block.

Contribution

The paper presents a new framework, EME-TTS, that improves emotional speech synthesis by better utilizing emphasis and ensuring perceptual clarity across emotions.

Findings

01

Enables more natural emotional speech synthesis.

02

Maintains stable and distinguishable emphasis across emotions.

03

Uses weakly supervised learning with emphasis pseudo-labels.

Abstract

In recent years, emotional Text-to-Speech (TTS) synthesis and emphasis-controllable speech synthesis have advanced significantly. However, their interaction remains underexplored. We propose Emphasis Meets Emotion TTS (EME-TTS), a novel framework designed to address two key research questions: (1) how to effectively utilize emphasis to enhance the expressiveness of emotional speech, and (2) how to maintain the perceptual clarity and stability of target emphasis across different emotions. EME-TTS employs weakly supervised learning with emphasis pseudo-labels and variance-based emphasis features. Additionally, the proposed Emphasis Perception Enhancement (EPE) block enhances the interaction between emotional signals and emphasis positions. Experimental results show that EME-TTS, when combined with large language models for emphasis position prediction, enables more natural emotional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Emotion and Mood Recognition