Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

Shuiyang Mao; P. C. Ching; and Tan Lee

arXiv:2103.16456·eess.AS·March 31, 2021·1 cites

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

Shuiyang Mao, P. C. Ching, and Tan Lee

PDF

Open Access

TL;DR

This paper introduces a deep self-learning framework for segment-based speech emotion recognition that iteratively refines noisy segment labels, significantly improving model performance on emotional speech datasets.

Contribution

It proposes a novel deep self-learning approach that dynamically corrects segment labels, addressing label noise issues in segment-based speech emotion recognition.

Findings

01

Significant performance improvements on three emotional corpora.

02

Effective label correction reduces noise impact.

03

Enhanced robustness of emotion recognition models.

Abstract

Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training. Recently, segment-based approaches for SER have been evolving, which train backbone networks on shorter segments instead of whole utterances, and thus naturally augments training examples without additional resources. However, one core challenge remains for segment-based approaches: most emotional corpora do not provide ground-truth labels at the segment level. To supervisely train a segment-based emotion model on such datasets, the most common way assigns each segment the corresponding utterance's emotion label. However, this practice typically introduces noisy (incorrect) labels as emotional information is not uniformly distributed across the whole utterance. On the other hand, DNNs have been shown to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Music and Audio Processing

MethodsSelf-Learning