BSC-UPC at EmoSPeech-IberLEF2024: Attention Pooling for Emotion Recognition
Marc Casals-Salvador, Federico Costa, Miquel India, Javier Hernando

TL;DR
This paper presents an attention pooling-based speech emotion recognition system using pre-trained models, achieving first place in the IberLEF 2024 challenge with an 86.69% Macro F1-Score.
Contribution
It introduces a novel approach combining pre-trained speech and text models with attention pooling for emotion recognition in Spanish speech data.
Findings
Achieved first place in IberLEF 2024 challenge
Attained 86.69% Macro F1-Score
Demonstrated effectiveness of attention pooling in SER
Abstract
The domain of speech emotion recognition (SER) has persistently been a frontier within the landscape of machine learning. It is an active field that has been revolutionized in the last few decades and whose implementations are remarkable in multiple applications that could affect daily life. Consequently, the Iberian Languages Evaluation Forum (IberLEF) of 2024 held a competitive challenge to leverage the SER results with a Spanish corpus. This paper presents the approach followed with the goal of participating in this competition. The main architecture consists of different pre-trained speech and text models to extract features from both modalities, utilizing an attention pooling mechanism. The proposed system has achieved the first position in the challenge with an 86.69% in Macro F1-Score.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsSoftmax · Attention Is All You Need · Attention Pooling
