EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen,, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain

TL;DR
EmoBox is a comprehensive multilingual speech emotion recognition toolkit and benchmark that standardizes data splits and provides extensive cross- and intra-corpus evaluations across multiple languages and datasets.
Contribution
It introduces a universal benchmark and toolkit for SER, enabling consistent evaluation across diverse datasets and languages, and addresses key challenges in dataset partitioning and reproducibility.
Findings
10 pre-trained models evaluated on 32 datasets in 14 languages
Cross-corpus SER results on 4 balanced datasets
Largest SER benchmark across multiple languages and datasets
Abstract
Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsSparse Evolutionary Training
