Adjusting Pleasure-Arousal-Dominance for Continuous Emotional   Text-to-speech Synthesizer

Azam Rabiee; Tae-Ho Kim; Soo-Young Lee

arXiv:1906.05507·eess.AS·November 28, 2022·5 cites

Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer

Azam Rabiee, Tae-Ho Kim, Soo-Young Lee

PDF

Open Access

TL;DR

This paper proposes a method to incorporate and adjust the Pleasure-Arousal-Dominance emotional dimensions into an end-to-end neural TTS system, enabling more nuanced and continuous emotional speech synthesis.

Contribution

It introduces an optimized neural architecture for integrating PAD emotional dimensions into Tacotron-based TTS and presents a method for adjusting these dimensions for synthesis.

Findings

01

Optimal network architecture for PAD integration identified

02

PAD values can be effectively adjusted for speech synthesis

03

Enables continuous and unlimited emotional expression in TTS

Abstract

Emotion is not limited to discrete categories of happy, sad, angry, fear, disgust, surprise, and so on. Instead, each emotion category is projected into a set of nearly independent dimensions, named pleasure (or valence), arousal, and dominance, known as PAD. The value of each dimension varies from -1 to 1, such that the neutral emotion is in the center with all-zero values. Training an emotional continuous text-to-speech (TTS) synthesizer on the independent dimensions provides the possibility of emotional speech synthesis with unlimited emotion categories. Our end-to-end neural speech synthesizer is based on the well-known Tacotron. Empirically, we have found the optimum network architecture for injecting the 3D PADs. Moreover, the PAD values are adjusted for the speech synthesis purpose.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing