Embedded Emotions -- A Data Driven Approach to Learn Transferable Feature Representations from Raw Speech Input for Emotion Recognition
Dominik Schiller, Silvan Mertes, Elisabeth Andr\'e

TL;DR
This paper explores transferring knowledge from large text and audio datasets to improve emotion recognition from speech, demonstrating promising results especially from textual features in elderly speaker narratives.
Contribution
It introduces a data-driven transfer learning approach for emotion recognition from raw speech, leveraging large corpora to learn transferable feature representations.
Findings
Text-based features outperform audio features in emotion classification.
Audio features perform well on development set but less consistently on test set.
Text features outperform baseline by 5.7 percentage points in unweighted average recall.
Abstract
Traditional approaches to automatic emotion recognition are relying on the application of handcrafted features. More recently however the advent of deep learning enabled algorithms to learn meaningful representations of input data automatically. In this paper, we investigate the applicability of transferring knowledge learned from large text and audio corpora to the task of automatic emotion recognition. To evaluate the practicability of our approach, we are taking part in this year's Interspeech ComParE Elderly Emotion Sub-Challenge, where the goal is to classify spoken narratives of elderly people with respect to the emotion of the speaker. Our results show that the learned feature representations can be effectively applied for classifying emotions from spoken language. We found the performance of the features extracted from the audio signal to be not as consistent as those that have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Emotion and Mood Recognition
