Gen-SER: When the generative model meets speech emotion recognition

Taihui Wang; Jinzheng Zhao; Rilin Chen; Tong Lei; Wenwu Wang; Dong Yu

arXiv:2601.20573·cs.SD·January 29, 2026

Gen-SER: When the generative model meets speech emotion recognition

Taihui Wang, Jinzheng Zhao, Rilin Chen, Tong Lei, Wenwu Wang, Dong Yu

PDF

Open Access

TL;DR

Gen-SER introduces a novel generative model approach to speech emotion recognition by reformulating it as a distribution shift problem, projecting labels into a continuous space, and matching distributions for classification.

Contribution

It presents a new generative modeling framework for SER that differs from traditional classification or large language model approaches, enhancing extensibility and potential applicability.

Findings

01

Effective in speech emotion recognition tasks

02

Demonstrates extensibility to other speech-understanding tasks

03

Shows potential for broader classification applications

Abstract

Speech emotion recognition (SER) is crucial in speech understanding and generation. Most approaches are based on either classification models or large language models. Different from previous methods, we propose Gen-SER, a novel approach that reformulates SER as a distribution shift problem via generative models. We propose to project discrete class labels into a continuous space, and obtain the terminal distribution via sinusoidal taxonomy encoding. The target-matching-based generative model is adopted to transform the initial distribution into the terminal distribution efficiently. The classification is achieved by calculating the similarity of the generated terminal distribution and ground truth terminal distribution. The experimental results confirm the efficacy of the proposed method, demonstrating its extensibility to various speech-understanding tasks and suggesting its potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining