Label Uncertainty Modeling and Prediction for Speech Emotion Recognition   using t-Distributions

Navin Raj Prabhu; Nale Lehmann-Willenbrock; Timo Gerkmann

arXiv:2207.12135·eess.AS·July 26, 2022·1 cites

Label Uncertainty Modeling and Prediction for Speech Emotion Recognition using t-Distributions

Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for modeling label uncertainty in speech emotion recognition using Student's t-distribution, which better accounts for small annotator samples than traditional Gaussian assumptions.

Contribution

The work proposes a t-distribution based model for label uncertainty, deriving a new loss function and demonstrating improved performance over Gaussian models in speech emotion recognition.

Findings

01

T-distribution model outperforms Gaussian in uncertainty estimation.

02

The approach achieves state-of-the-art results on AVEC'16 dataset.

03

Faster convergence compared to Gaussian-based methods.

Abstract

As different people perceive others' emotional expressions differently, their annotation in terms of arousal and valence are per se subjective. To address this, these emotion annotations are typically collected by multiple annotators and averaged across annotators in order to obtain labels for arousal and valence. However, besides the average, also the uncertainty of a label is of interest, and should also be modeled and predicted for automatic emotion recognition. In the literature, for simplicity, label uncertainty modeling is commonly approached with a Gaussian assumption on the collected annotations. However, as the number of annotators is typically rather small due to resource constraints, we argue that the Gaussian approach is a rather crude assumption. In contrast, in this work we propose to model the label distribution using a Student's t-distribution which allows us to account…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sp-uhh/label-uncertainty-ser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing