Label Uncertainty Modeling and Prediction for Speech Emotion Recognition using t-Distributions
Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann

TL;DR
This paper introduces a novel approach for modeling label uncertainty in speech emotion recognition using Student's t-distribution, which better accounts for small annotator samples than traditional Gaussian assumptions.
Contribution
The work proposes a t-distribution based model for label uncertainty, deriving a new loss function and demonstrating improved performance over Gaussian models in speech emotion recognition.
Findings
T-distribution model outperforms Gaussian in uncertainty estimation.
The approach achieves state-of-the-art results on AVEC'16 dataset.
Faster convergence compared to Gaussian-based methods.
Abstract
As different people perceive others' emotional expressions differently, their annotation in terms of arousal and valence are per se subjective. To address this, these emotion annotations are typically collected by multiple annotators and averaged across annotators in order to obtain labels for arousal and valence. However, besides the average, also the uncertainty of a label is of interest, and should also be modeled and predicted for automatic emotion recognition. In the literature, for simplicity, label uncertainty modeling is commonly approached with a Gaussian assumption on the collected annotations. However, as the number of annotators is typically rather small due to resource constraints, we argue that the Gaussian approach is a rather crude assumption. In contrast, in this work we propose to model the label distribution using a Student's t-distribution which allows us to account…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing
