Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

Pedro Corr\^ea; Jo\~ao Lima; Victor Moreno; Lucas Ueda; Paula Dornhofer Paro Costa

arXiv:2510.25054·cs.CL·October 31, 2025

Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

Pedro Corr\^ea, Jo\~ao Lima, Victor Moreno, Lucas Ueda, Paula Dornhofer Paro Costa

PDF

1 Datasets

TL;DR

This paper evaluates how well spoken language models recognize emotions in speech when the semantic content and speech expressiveness are incongruent, revealing a reliance on textual semantics over acoustic cues.

Contribution

It introduces an evaluation of SLMs on emotionally incongruent speech and releases a new dataset and code for further research.

Findings

01

SLMs rely mainly on textual semantics for emotion recognition.

02

Speech expressiveness has limited influence on model predictions.

03

The study highlights the need for better integration of audio and text modalities.

Abstract

Advancements in spoken language processing have driven the development of spoken language models (SLMs), designed to achieve universal audio understanding by jointly learning text and audio representations for a wide range of tasks. Although promising results have been achieved, there is growing discussion regarding these models' generalization capabilities and the extent to which they truly integrate audio and text modalities in their internal representations. In this work, we evaluate four SLMs on the task of speech emotion recognition using a dataset of emotionally incongruent speech samples, a condition under which the semantic content of the spoken utterance conveys one emotion while speech expressiveness conveys another. Our results indicate that SLMs rely predominantly on textual semantics rather than speech emotion to perform the task, indicating that text-related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Wonder239/DEAF
dataset· 311 dl
311 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.