Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label   Regression Emotion Share from Speech

Bagus Tris Atmaja; Akira Sasou

arXiv:2309.11014·eess.AS·September 21, 2023

Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

Bagus Tris Atmaja, Akira Sasou

PDF

Open Access

TL;DR

This paper introduces an ensemble approach combining multilingual pre-trained speech models to improve multi-label emotion share prediction, demonstrating enhanced performance over previous monolingual fusion methods.

Contribution

It proposes a novel ensemble learning method for multilingual speech emotion recognition, addressing data scarcity and cross-lingual evaluation challenges.

Findings

01

Ensemble learning improved Spearman correlation to 0.537 on test data.

02

Multilingual models outperform monolingual fusion methods.

03

The approach effectively handles emotion share regression from speech.

Abstract

Speech emotion recognition has evolved from research to practical applications. Previous studies of emotion recognition from speech have focused on developing models on certain datasets like IEMOCAP. The lack of data in the domain of emotion modeling emerges as a challenge to evaluate models in the other dataset, as well as to evaluate speech emotion recognition models that work in a multilingual setting. This paper proposes an ensemble learning to fuse results of pre-trained models for emotion share recognition from speech. The models were chosen to accommodate multilingual data from English and Spanish. The results show that ensemble learning can improve the performance of the baseline model with a single model and the previous best model from the late fusion. The performance is measured using the Spearman rank correlation coefficient since the task is a regression problem with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing