Human Feedback Driven Dynamic Speech Emotion Recognition
Ilya Fedorov, Dmitry Korobchenko

TL;DR
This paper introduces a dynamic speech emotion recognition framework that models emotional sequences over time, utilizing human feedback and Dirichlet distribution to improve emotional mixture modeling, especially for animating emotional 3D avatars.
Contribution
It presents a novel multi-stage approach combining classical recognition, synthetic emotional sequence generation, and human feedback, with a new Dirichlet-based emotional mixture model.
Findings
Dirichlet-based emotional mixture modeling outperforms sliding window methods
Human feedback enhances model accuracy and simplifies annotation
Effective modeling of emotional sequences for 3D avatar animation
Abstract
This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face recognition and analysis · Face and Expression Recognition
