Emotional speech synthesis with rich and granularized control

Se-Yun Um; Sangshin Oh; Kyungguen Byun; Inseon Jang; Chunghyun Ahn,; Hong-Goo Kang

arXiv:1911.01635·eess.AS·November 7, 2019

Emotional speech synthesis with rich and granularized control

Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn,, Hong-Goo Kang

PDF

Open Access

TL;DR

This paper presents a novel emotion control method for end-to-end TTS systems, using an inter-to-intra emotional distance ratio and interpolation to enhance emotional expressiveness and granular control.

Contribution

It introduces a new embedding distance ratio algorithm and an interpolation technique for improved emotion control in speech synthesis.

Findings

01

The proposed method outperforms conventional approaches in subjective evaluations.

02

It enables gradual emotion intensity modulation in synthesized speech.

03

The approach improves emotional expressiveness and controllability in TTS.

Abstract

This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques