Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Youjun Chen; Xurong Xie; Haoning Xu; Mengzhe Geng; Guinan Li; Chengxi Deng; Huimeng Wang; Shujie Hu; Xunying Liu

arXiv:2505.23236·cs.SD·May 30, 2025

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an LLM-empowered approach for fine-grained, explainable speech emotion recognition that disentangles speech features and improves accuracy on benchmark datasets.

Contribution

It proposes a novel end-to-end method combining LLM fine-tuning, feature disentanglement, and VAE compression for enhanced SER explainability and performance.

Findings

01

Outperforms baseline models on IEMOCAP and MELD datasets.

02

Achieves up to 4.0% absolute increase in unweighted accuracy.

03

Provides more interpretable emotion descriptors for SER.

Abstract

This paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used to adjust feature granularity. Experiments on the IEMOCAP and MELD benchmarks demonstrate that our approach consistently outperforms comparable LLaMA-based SER baselines, including those using either (a) alternating multi-task fine-tuning alone or (b) feature disentanglement only. Statistically significant increase of SER unweighted accuracy by up to 4.0% and 3.7% absolute (5.4% and 6.6% relative) are obtained. More importantly, emotion descriptors offer further explainability for SER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SEUJames23/explainable-emotion-recognition
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Explainable Artificial Intelligence (XAI)