EmoTale: An Enacted Speech-emotion Dataset in Danish
Maja J. Hjuler, Harald V. Skat-R{\o}rdam, Line H. Clemmensen, Sneha Das

TL;DR
EmoTale is a new Danish and English speech emotion dataset that enables improved speech emotion recognition models, demonstrating the effectiveness of self-supervised embeddings over traditional features.
Contribution
This paper introduces EmoTale, a novel Danish and English speech emotion dataset with annotations, and evaluates its utility with SER models using self-supervised embeddings.
Findings
Self-supervised speech embeddings outperform handcrafted features.
The best SER model achieves 64.1% UAR on EmoTale.
EmoTale's predictive power is comparable to existing Danish emotional speech data.
Abstract
While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. To our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We present EmoTale; a corpus comprising Danish and English speech recordings with their associated enacted emotion annotations. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoTale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor. We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoTale corpus using leave-one-speaker-out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Digital Communication and Language · Speech and dialogue systems
