Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

Haoyu Wang; Chunyu Qiang; Tianrui Wang; Cheng Gong; Yu Jiang; Yuheng Lu; Chen Zhang; Longbiao Wang; Jianwu Dang

arXiv:2409.18512·cs.SD·April 6, 2026

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang

PDF

1 Repo

TL;DR

This paper introduces a two-stage prompt selection method for zero-shot speech synthesis that enhances emotional intensity and speaker consistency by evaluating prompts with prosodic, perceptual, and semantic metrics.

Contribution

The proposed static and dynamic prompt selection strategy improves expressive speech synthesis by ensuring stable speaker identity and appropriate emotional cues in zero-shot TTS.

Findings

01

Enhanced emotional expression in synthesized speech.

02

Improved speaker similarity and stability.

03

Effective prompt selection demonstrated through experiments.

Abstract

Recent advancements in speech synthesis have enabled large language model (LLM)-based systems to perform zero-shot generation with controllable content, timbre, speaker identity, and emotion through input prompts. As a result, these models heavily rely on prompt design to guide the generation process. However, existing prompt selection methods often fail to ensure that prompts contain sufficiently stable speaker identity cues and appropriate emotional intensity indicators, which are crucial for expressive speech synthesis. To address this challenge, we propose a two-stage prompt selection strategy specifically designed for expressive speech synthesis. In the static stage (before synthesis), we first evaluate prompt candidates using pitch-based prosodic features, perceptual audio quality, and text-emotion coherence scores evaluated by an LLM. We further assess the candidates under a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://whyrrrrun.github.io/ExpPro.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.