Speech Synthesis with Mixed Emotions

Kun Zhou; Berrak Sisman; Rajib Rana; B. W. Schuller; Haizhou Li

arXiv:2208.05890·cs.CL·January 2, 2023·1 cites

Speech Synthesis with Mixed Emotions

Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a novel speech synthesis framework that can generate speech with mixed emotions by measuring and controlling emotional differences, enabling more nuanced and realistic emotional speech synthesis.

Contribution

It presents the first method for modeling, synthesizing, and evaluating mixed emotions in speech using a sequence-to-sequence framework with a novel emotion difference formulation.

Findings

01

Effective control of mixed emotions in synthesized speech

02

Validated through objective and subjective evaluations

03

First study to model and synthesize mixed emotions

Abstract

Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. During the training, the framework does not only explicitly characterize emotion styles, but also explores the ordinal nature of emotions by quantifying the differences with other emotions. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector. The objective and subjective evaluations have validated the effectiveness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing