A cross-corpus study on speech emotion recognition
Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain

TL;DR
This study investigates whether training on acted speech emotion datasets can improve recognition of natural emotions, finding that domain adaptation techniques enhance cross-corpus generalization.
Contribution
It introduces a bi-directional LSTM with attention for cross-corpus emotion recognition and evaluates domain adaptation methods to improve transferability.
Findings
Transfer of information from acted to natural datasets is possible.
Domain adversarial training improves cross-corpus emotion recognition.
Training on multiple datasets benefits model generalization.
Abstract
For speech emotion datasets, it has been difficult to acquire large quantities of reliable data and acted emotions may be over the top compared to less expressive emotions displayed in everyday life. Lately, larger datasets with natural emotions have been created. Instead of ignoring smaller, acted datasets, this study investigates whether information learnt from acted emotions is useful for detecting natural emotions. Cross-corpus research has mostly considered cross-lingual and even cross-age datasets, and difficulties arise from different methods of annotating emotions causing a drop in performance. To be consistent, four adult English datasets covering acted, elicited and natural emotions are considered. A state-of-the-art model is proposed to accurately investigate the degradation of performance. The system involves a bi-directional LSTM with an attention mechanism to classify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
