Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

Rui Liu; Bin Liu; Haizhou Li

arXiv:2309.11724·cs.AI·September 22, 2023·1 cites

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

Rui Liu, Bin Liu, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces EmoPP, an emotion-aware prosodic phrasing model for expressive TTS, which accurately captures emotional cues to improve naturalness and expressiveness in synthesized speech.

Contribution

The study proposes a novel emotion-aware prosodic phrasing model, EmoPP, that effectively mines emotional cues to enhance expressive speech synthesis.

Findings

01

EmoPP outperforms baseline models in objective and subjective evaluations.

02

Strong correlation between emotion and prosodic phrasing validated on ESD dataset.

03

Enhanced emotion expressiveness achieved in TTS with EmoPP.

Abstract

Prosodic phrasing is crucial to the naturalness and intelligibility of end-to-end Text-to-Speech (TTS). There exist both linguistic and emotional prosody in natural speech. As the study of prosodic phrasing has been linguistically motivated, prosodic phrasing for expressive emotion rendering has not been well studied. In this paper, we propose an emotion-aware prosodic phrasing model, termed \textit{EmoPP}, to mine the emotional cues of utterance accurately and predict appropriate phrase breaks. We first conduct objective observations on the ESD dataset to validate the strong correlation between emotion and prosodic phrasing. Then the objective and subjective evaluations show that the EmoPP outperforms all baselines and achieves remarkable performance in terms of emotion expressiveness. The audio samples and the code are available at \url{https://github.com/AI-S2-Lab/EmoPP}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-s2-lab/emopp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems