Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Jin Li, Nan Yan, Lan Wang

TL;DR
This paper introduces an unsupervised cross-lingual speech emotion recognition method using pseudo multilabels and external memory, significantly improving accuracy across multiple low-resource languages.
Contribution
It proposes a novel neural network approach with external memory and pseudo multilabel generation for cross-lingual SER, addressing domain differences without labeled target data.
Findings
Significant accuracy improvements on Urdu, Skropus, ShEMO, and EMO-DB datasets.
Effective cross-lingual transfer without target domain labels.
Code availability facilitates further research.
Abstract
Speech Emotion Recognition (SER) in a single language has achieved remarkable results through deep learning approaches in the last decade. However, cross-lingual SER remains a challenge in real-world applications due to a great difference between the source and target domain distributions. To address this issue, we propose an unsupervised cross-lingual Neural Network with Pseudo Multilabel (NNPM) that is trained to learn the emotion similarities between source domain features inside an external memory adjusted to identify emotion in cross-lingual databases. NNPM introduces a novel approach that leverages external memory to store source domain features and generates pseudo multilabel for each target domain data by computing the similarities between the external memory and the target domain features. We evaluate our approach on multiple different languages of speech emotion databases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Sentiment Analysis and Opinion Mining
