Dynamic Time-Alignment of Dimensional Annotations of Emotion using Recurrent Neural Networks
Sina Alisamir, Fabien Ringeval, Francois Portet

TL;DR
This paper introduces a Recurrent Neural Network-based method to dynamically align and synchronize subjective, inconsistent emotion annotations from multiple annotators with acoustic features, improving agreement and prediction accuracy across diverse datasets.
Contribution
The proposed method effectively compensates annotation inconsistencies and enhances synchronization between annotations and audio features using RNNs, a novel approach in emotion recognition.
Findings
Significantly increased inter-annotator agreement.
Improved correlation between annotations and acoustic features.
Enhanced emotion prediction accuracy, especially for valence and arousal.
Abstract
Most automatic emotion recognition systems exploit time-continuous annotations of emotion to provide fine-grained descriptions of spontaneous expressions as observed in real-life interactions. As emotion is rather subjective, its annotation is usually performed by several annotators who provide a trace for a given dimension, i.e. a time-continuous series describing a dimension such as arousal or valence. However, annotations of the same expression are rarely consistent between annotators, either in time or in value, which adds bias and delay in the trace that is used to learn predictive models of emotion. We therefore propose a method that can dynamically compensate inconsistencies across annotations and synchronise the traces with the corresponding acoustic features using Recurrent Neural Networks. Experimental evaluations were carried on several emotion data sets that include Chinese,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech and Audio Processing
