Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Thanathai Lertpetchpun; Tiantian Feng; Dani Byrd; Shrikanth Narayanan

arXiv:2506.10930·cs.LG·June 13, 2025

Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan

PDF

Open Access

TL;DR

This paper introduces a high-performance speech emotion recognition framework tailored for naturalistic conditions, utilizing multimodal and multi-task learning to address labeling disagreements and data imbalance, achieving top results in a challenge.

Contribution

The paper presents a reproducible framework that combines multimodal and multi-task learning techniques to improve SER performance in naturalistic settings, winning the IS25-SER Challenge.

Findings

01

Achieved top performance in the IS25-SER Challenge

02

Utilized multimodal and multi-task learning strategies

03

Ensemble of two systems yielded best results

Abstract

Speech emotion recognition (SER) in naturalistic conditions presents a significant challenge for the speech processing community. Challenges include disagreement in labeling among annotators and imbalanced data distributions. This paper presents a reproducible framework that achieves superior (top 1) performance in the Emotion Recognition in Naturalistic Conditions Challenge (IS25-SER Challenge) - Task 2, evaluated on the MSP-Podcast dataset. Our system is designed to tackle the aforementioned challenges through multimodal learning, multi-task learning, and imbalanced data handling. Specifically, our best system is trained by adding text embeddings, predicting gender, and including ``Other'' (O) and ``No Agreement'' (X) samples in the training set. Our system's results secured both first and second places in the IS25-SER Challenge, and the top performance was achieved by a simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis