Continuous Metric Learning For Transferable Speech Emotion Recognition   and Embedding Across Low-resource Languages

Sneha Das; Nicklas Leander Lund; Nicole Nadine L{\o}nfeldt; Anne; Katrine Pagsberg; Line H. Clemmensen

arXiv:2203.14867·eess.AS·March 29, 2022

Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Sneha Das, Nicklas Leander Lund, Nicole Nadine L{\o}nfeldt, Anne, Katrine Pagsberg, Line H. Clemmensen

PDF

Open Access

TL;DR

This paper introduces a novel continuous metric learning approach for speech emotion recognition that enhances transferability across low-resource languages by leveraging semi-supervised learning and dimensional emotion annotations.

Contribution

It proposes the first continuous metric learning method for SER, improving transferability and performance in low-resource language scenarios through semi-supervised autoencoder training.

Findings

01

Outperforms baseline unsupervised autoencoder in emotion classification accuracy

02

Achieves better correlation with dimensional emotion variables

03

Comparable to BERT-based models with lower complexity

Abstract

Speech emotion recognition~(SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. However, developing generalizable and transferable models are critical due to a lack of sufficient resources in terms of data and labels for languages beyond the most commonly spoken ones. To improve performance over languages, we propose a denoising autoencoder with semi-supervision using a continuous metric loss based on either activation or valence. The novelty of this work lies in our proposal of continuous metric learning, which is among the first proposals on the topic to the best of our knowledge. Furthermore, to address the lack of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Dropout · Weight Decay · Dense Connections · Attention Dropout · Multi-Head Attention