LSSED: a large-scale dataset and benchmark for speech emotion recognition
Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Weidong Chen, Dongyan Huang

TL;DR
This paper introduces LSSED, a large-scale English speech emotion dataset with data from 820 subjects, along with pre-trained models to advance speech emotion recognition and related applications like mental health analysis.
Contribution
It provides a large-scale, real-world speech emotion dataset and pre-trained models to facilitate research and transfer learning in speech emotion recognition.
Findings
Large-scale dataset improves model performance
Pre-trained models enhance downstream task accuracy
Dataset and models promote research in emotion recognition
Abstract
Speech emotion recognition is a vital contributor to the next generation of human-computer interaction (HCI). However, current existing small-scale databases have limited the development of related research. In this paper, we present LSSED, a challenging large-scale english speech emotion dataset, which has data collected from 820 subjects to simulate real-world distribution. In addition, we release some pre-trained models based on LSSED, which can not only promote the development of speech emotion recognition, but can also be transferred to related downstream tasks such as mental health analysis where data is extremely difficult to collect. Finally, our experiments show the necessity of large-scale datasets and the effectiveness of pre-trained models. The dateset will be released on https://github.com/tobefans/LSSED.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and dialogue systems
