Learning Transferable Features for Speech Emotion Recognition
Alison Marczewski, Adriano Veloso, N\'ivio Ziviani

TL;DR
This paper introduces a deep learning architecture that leverages transferable features to improve speech emotion recognition across diverse domains, addressing challenges like limited labeled data and domain variability.
Contribution
It proposes a joint convolutional and LSTM-based model that learns domain-shared and domain-specific features for effective transfer learning in speech emotion recognition.
Findings
Transferable features improve recognition accuracy by up to 18.4%.
Cross-corpora experiments demonstrate robustness across diverse speech emotion domains.
Domain adaptation approaches and ablation studies identify key source domains for optimal transfer.
Abstract
Emotion recognition from speech is one of the key steps towards emotional intelligence in advanced human-machine interaction. Identifying emotions in human speech requires learning features that are robust and discriminative across diverse domains that differ in terms of language, spontaneity of speech, recording conditions, and types of emotions. This corresponds to a learning scenario in which the joint distributions of features and labels may change substantially across domains. In this paper, we propose a deep architecture that jointly exploits a convolutional network for extracting domain-shared features and a long short-term memory network for classifying emotions using domain-specific features. We use transferable features to enable model adaptation from multiple source domains, given the sparseness of speech emotion data and the fact that target domains are short of labeled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMemory Network
