Gen-SER: When the generative model meets speech emotion recognition
Taihui Wang, Jinzheng Zhao, Rilin Chen, Tong Lei, Wenwu Wang, Dong Yu

TL;DR
Gen-SER introduces a novel generative model approach to speech emotion recognition by reformulating it as a distribution shift problem, projecting labels into a continuous space, and matching distributions for classification.
Contribution
It presents a new generative modeling framework for SER that differs from traditional classification or large language model approaches, enhancing extensibility and potential applicability.
Findings
Effective in speech emotion recognition tasks
Demonstrates extensibility to other speech-understanding tasks
Shows potential for broader classification applications
Abstract
Speech emotion recognition (SER) is crucial in speech understanding and generation. Most approaches are based on either classification models or large language models. Different from previous methods, we propose Gen-SER, a novel approach that reformulates SER as a distribution shift problem via generative models. We propose to project discrete class labels into a continuous space, and obtain the terminal distribution via sinusoidal taxonomy encoding. The target-matching-based generative model is adopted to transform the initial distribution into the terminal distribution efficiently. The classification is achieved by calculating the similarity of the generated terminal distribution and ground truth terminal distribution. The experimental results confirm the efficacy of the proposed method, demonstrating its extensibility to various speech-understanding tasks and suggesting its potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining
