Human Feedback Driven Dynamic Speech Emotion Recognition

Ilya Fedorov; Dmitry Korobchenko

arXiv:2508.14920·cs.SD·August 22, 2025

Human Feedback Driven Dynamic Speech Emotion Recognition

Ilya Fedorov, Dmitry Korobchenko

PDF

Open Access

TL;DR

This paper introduces a dynamic speech emotion recognition framework that models emotional sequences over time, utilizing human feedback and Dirichlet distribution to improve emotional mixture modeling, especially for animating emotional 3D avatars.

Contribution

It presents a novel multi-stage approach combining classical recognition, synthetic emotional sequence generation, and human feedback, with a new Dirichlet-based emotional mixture model.

Findings

01

Dirichlet-based emotional mixture modeling outperforms sliding window methods

02

Human feedback enhances model accuracy and simplifies annotation

03

Effective modeling of emotional sequences for 3D avatar animation

Abstract

This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face recognition and analysis · Face and Expression Recognition