StrengthNet: Deep Learning-based Emotion Strength Assessment for   Emotional Speech Synthesis

Rui Liu; Berrak Sisman; Haizhou Li

arXiv:2110.03156·cs.SD·October 11, 2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Rui Liu, Berrak Sisman, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

StrengthNet is a deep learning model designed to accurately assess emotion strength in speech, improving generalization across different datasets for more realistic emotional speech synthesis.

Contribution

The paper introduces StrengthNet, a multi-task learning framework with data augmentation, enhancing emotion strength prediction accuracy and generalization in speech synthesis.

Findings

01

High correlation between predicted and ground truth emotion strength.

02

Effective generalization to unseen speech data.

03

Improved emotion strength assessment accuracy.

Abstract

Recently, emotional speech synthesis has achieved remarkable performance. The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function. However, a trained ranking function on specific data has poor generalization, which limits its applicability for more realistic cases. In this paper, we propose a deep learning based emotion strength assessment network for strength prediction that is referred to as StrengthNet. Our model conforms to a multi-task learning framework with a structure that includes an acoustic encoder, a strength predictor and an auxiliary emotion predictor. A data augmentation strategy was utilized to improve the model generalization. Experiments show that the predicted emotion strength of the proposed StrengthNet are highly correlated with ground truth scores for seen and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ttslr/strengthnet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing