SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified   Datasets and Multitask Learning

Zuheng Kang; Junqing Peng; Jianzong Wang; Jing Xiao

arXiv:2206.13101·cs.SD·July 29, 2022·1 cites

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

PDF

Open Access

TL;DR

SpeechEQ introduces a unified multi-scale metric and multitask learning approach for speech emotion recognition, improving accuracy across Mandarin and English datasets by leveraging auxiliary tasks and a new dataset.

Contribution

The paper presents SpeechEQ, a novel framework that unifies SER tasks with a multi-scale metric and multitask learning, including a new Mandarin dataset and state-of-the-art results.

Findings

01

Outperforms baseline methods with 8.0% and 6.5% accuracy improvements on Mandarin datasets.

02

Achieves state-of-the-art weighted accuracy of 78.16% on IEMOCAP.

03

Demonstrates effectiveness of multitask learning with auxiliary tasks in SER.

Abstract

Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0% and 6.5% improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining