Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on   Data-Driven Deep Learning

Rui Liu; Berrak Sisman; Bj\"orn Schuller; Guanglai Gao; Haizhou Li

arXiv:2206.07229·cs.SD·June 16, 2022·1 cites

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

Rui Liu, Berrak Sisman, Bj\"orn Schuller, Guanglai Gao, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces StrengthNet, a deep learning model that accurately assesses emotion strength in speech, generalizing well to both seen and unseen data by leveraging multi-domain emotional data fusion.

Contribution

The paper presents a novel multi-task deep learning framework, StrengthNet, that improves emotion strength assessment for unseen speech domains, surpassing previous SVM-based methods.

Findings

01

High correlation with ground truth in emotion strength prediction

02

Effective generalization to unseen speech data

03

Fusion of multi-domain emotional data enhances performance

Abstract

Emotion classification of speech and assessment of the emotion strength are required in applications such as emotional text-to-speech and voice conversion. The emotion attribute ranking function based on Support Vector Machine (SVM) was proposed to predict emotion strength for emotional speech corpus. However, the trained ranking function doesn't generalize to new domains, which limits the scope of applications, especially for out-of-domain or unseen speech. In this paper, we propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech. This is achieved by the fusion of emotional data from various domains. We follow a multi-task learning network architecture that includes an acoustic encoder, a strength predictor, and an auxiliary emotion predictor. Experiments show that the predicted emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ttslr/strengthnet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing