Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages
Sneha Das, Nicklas Leander Lund, Nicole Nadine L{\o}nfeldt, Anne, Katrine Pagsberg, Line H. Clemmensen

TL;DR
This paper introduces a novel continuous metric learning approach for speech emotion recognition that enhances transferability across low-resource languages by leveraging semi-supervised learning and dimensional emotion annotations.
Contribution
It proposes the first continuous metric learning method for SER, improving transferability and performance in low-resource language scenarios through semi-supervised autoencoder training.
Findings
Outperforms baseline unsupervised autoencoder in emotion classification accuracy
Achieves better correlation with dimensional emotion variables
Comparable to BERT-based models with lower complexity
Abstract
Speech emotion recognition~(SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. However, developing generalizable and transferable models are critical due to a lack of sufficient resources in terms of data and labels for languages beyond the most commonly spoken ones. To improve performance over languages, we propose a denoising autoencoder with semi-supervision using a continuous metric loss based on either activation or valence. The novelty of this work lies in our proposal of continuous metric learning, which is among the first proposals on the topic to the best of our knowledge. Furthermore, to address the lack of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Dropout · Weight Decay · Dense Connections · Attention Dropout · Multi-Head Attention
