Curriculum Learning for Speech Emotion Recognition from Crowdsourced Labels
Reza Lotfian, Carlos Busso

TL;DR
This paper proposes a curriculum learning approach for speech emotion recognition that uses human annotation disagreement as a difficulty measure, leading to improved training efficiency and accuracy.
Contribution
It introduces a novel method to define curriculum difficulty based on inter-evaluator disagreement in crowdsourced labels for speech emotion recognition.
Findings
Curriculum based on evaluator disagreement improves model performance.
Significant gains over non-curriculum training methods.
Applicable to regression, binary, and multi-class emotion recognition tasks.
Abstract
This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown the benefits of training a classifier following a curriculum where samples are gradually presented in increasing level of difficulty. For speech emotion recognition, the challenge is to establish a natural order of difficulty in the training set to create the curriculum. We address this problem by assuming that ambiguous samples for humans are also ambiguous for computers. Speech samples are often annotated by multiple evaluators to account for differences in emotion perception across individuals. While some sentences with clear emotional content are consistently annotated, sentences with more ambiguous emotional content present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
