CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion   Intensity Regulation

Yuhao Cui; Xiongwei Wang; Zhongzhou Zhao; Wei Zhou; Haiqing Chen

arXiv:2307.00020·cs.SD·July 4, 2023

CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

Yuhao Cui, Xiongwei Wang, Zhongzhou Zhao, Wei Zhou, Haiqing Chen

PDF

Open Access

TL;DR

This paper introduces CASEIN, a novel framework for fine-grained emotion intensity regulation in speech synthesis, combining explicit and implicit controls to improve controllability and naturalness, especially for mixed emotions.

Contribution

The paper proposes a new cascaded control framework that disentangles emotion manifolds and enables precise regulation of multiple emotion intensities in speech synthesis.

Findings

01

Outperforms existing methods in controllability and naturalness

02

First to achieve fine-grained control over mixed emotion intensities

03

Reduces bias in emotion intensity learning

Abstract

Existing fine-grained intensity regulation methods rely on explicit control through predicted emotion probabilities. However, these high-level semantic probabilities are often inaccurate and unsmooth at the phoneme level, leading to bias in learning. Especially when we attempt to mix multiple emotion intensities for specific phonemes, resulting in markedly reduced controllability and naturalness of the synthesis. To address this issue, we propose the CAScaded Explicit and Implicit coNtrol framework (CASEIN), which leverages accurate disentanglement of emotion manifolds from the reference speech to learn the implicit representation at a lower semantic level. This representation bridges the semantical gap between explicit probabilities and the synthesis model, reducing bias in learning. In experiments, our CASEIN surpasses existing methods in both controllability and naturalness. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition