Personalized Adaptation with Pre-trained Speech Encoders for Continuous   Emotion Recognition

Minh Tran; Yufeng Yin; Mohammad Soleymani

arXiv:2309.02418·eess.AS·September 6, 2023·1 cites

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Minh Tran, Yufeng Yin, Mohammad Soleymani

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised personalized speech emotion recognition method that leverages pre-trained speech encoders with speaker embeddings and label distribution compensation, achieving state-of-the-art results.

Contribution

It proposes a new unsupervised approach combining speaker-conditioned pre-training and label shift compensation for improved emotion recognition.

Findings

01

Outperforms existing personalization baselines.

02

Achieves state-of-the-art valence estimation performance.

03

Demonstrates robustness across diverse speakers.

Abstract

There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-train an encoder with learnable speaker embeddings in a self-supervised manner to learn robust speech representations conditioned on speakers. Second, we propose an unsupervised method to compensate for the label distribution shifts by finding similar speakers and leveraging their label distributions from the training set. Extensive experimental results on the MSP-Podcast corpus indicate that our method consistently outperforms strong personalization baselines and achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining