Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised   Models

Zhichen Han; Tianqi Geng; Hui Feng; Jiahong Yuan; Korin Richmond,; Yuanchao Li

arXiv:2409.16920·eess.AS·May 1, 2025·2 cites

Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models

Zhichen Han, Tianqi Geng, Hui Feng, Jiahong Yuan, Korin Richmond,, Yuanchao Li

PDF

Open Access 1 Repo

TL;DR

This study compares human and self-supervised model performance in cross-lingual speech emotion recognition, revealing models can adapt effectively with knowledge transfer and highlighting dialect's impact on emotion perception.

Contribution

It provides a comprehensive analysis of SSL models versus humans in cross-lingual SER, including layer-wise, fine-tuning, and dialect effects, which is novel in the field.

Findings

01

Models can adapt to new languages with knowledge transfer.

02

Dialect significantly affects emotion recognition accuracy.

03

Humans and models show different emotion recognition behaviors.

Abstract

Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios. This study presents a comparative analysis between human performance and SSL models, beginning with a layer-wise analysis and an exploration of parameter-efficient fine-tuning strategies in monolingual, cross-lingual, and transfer learning contexts. We further compare the SER ability of models and humans at both utterance- and segment-levels. Additionally, we investigate the impact of dialect on cross-lingual SER through human evaluation. Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers. We also demonstrate the significant effect of dialect on SER for individuals without prior linguistic and paralinguistic background.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhan7721/crosslingual_ser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis