Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
Honghong Wang, Xupeng Jia, Jing Deng, Rong Zheng

TL;DR
This paper introduces a fine-tuned DWFormer model for speech emotion recognition, incorporating data augmentation and score fusion, achieving first place in the IERP Challenge 2024 using solely audio features.
Contribution
It presents a novel application of fine-tuning DWFormer with data augmentation and score fusion for emotion recognition in audio, outperforming other methods in the challenge.
Findings
Achieved first place in Track 1 of IERP Challenge 2024.
Demonstrated the effectiveness of data augmentation and score fusion.
Outperformed other participating teams in emotion recognition accuracy.
Abstract
The field of artificial intelligence has a strong interest in the topic of emotion recognition. The majority of extant emotion recognition models are oriented towards enhancing the precision of discrete emotion label prediction. Given the direct relationship between human personality and emotion, as well as the significant inter-individual differences in subjective emotional expression, the IERP Challenge 2024 incorporates personality traits into emotion recognition research. This paper presents the Fosafer submissions to the Track 1 of the IERP Challenge 2024. This task primarily concerns the recognition of emotions in audio, while also providing text and audio features. In Track 1, we utilized exclusively audio-based features and fine-tuned a pre-trained speech emotion recognition model, DWFormer, through the integration of data augmentation and score fusion strategies, thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
