Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Rui Liu, Berrak Sisman, Bj\"orn Schuller, Guanglai Gao, Haizhou Li

TL;DR
This paper introduces StrengthNet, a deep learning model that accurately assesses emotion strength in speech, generalizing well to both seen and unseen data by leveraging multi-domain emotional data fusion.
Contribution
The paper presents a novel multi-task deep learning framework, StrengthNet, that improves emotion strength assessment for unseen speech domains, surpassing previous SVM-based methods.
Findings
High correlation with ground truth in emotion strength prediction
Effective generalization to unseen speech data
Fusion of multi-domain emotional data enhances performance
Abstract
Emotion classification of speech and assessment of the emotion strength are required in applications such as emotional text-to-speech and voice conversion. The emotion attribute ranking function based on Support Vector Machine (SVM) was proposed to predict emotion strength for emotional speech corpus. However, the trained ranking function doesn't generalize to new domains, which limits the scope of applications, especially for out-of-domain or unseen speech. In this paper, we propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech. This is achieved by the fusion of emotional data from various domains. We follow a multi-task learning network architecture that includes an acoustic encoder, a strength predictor, and an auxiliary emotion predictor. Experiments show that the predicted emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing
