Unsupervised Personalization of an Emotion Recognition System: The   Unique Properties of the Externalization of Valence in Speech

Kusha Sridhar; Carlos Busso

arXiv:2201.07876·cs.SD·May 15, 2023·1 cites

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Kusha Sridhar, Carlos Busso

PDF

Open Access

TL;DR

This paper introduces an unsupervised speaker adaptation method for speech emotion recognition systems to improve valence prediction by leveraging similar acoustic patterns, achieving up to 13.52% improvement.

Contribution

It proposes three novel unsupervised adaptation strategies for personalizing valence prediction models in speech emotion recognition.

Findings

01

Unsupervised adaptation improves valence prediction accuracy.

02

Transfer learning enhances speaker-specific model performance.

03

Relative improvements up to 13.52% achieved.

Abstract

The prediction of valence from speech is an important, but challenging problem. The externalization of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emotional attributes such as arousal and dominance. A practical approach to improve valence prediction from speech is to adapt the models to the target speakers in the test set. Adapting a speech emotion recognition (SER) system to a particular speaker is a hard problem, especially with deep neural networks (DNNs), since it requires optimizing millions of parameters. This study proposes an unsupervised approach to address this problem by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set. Speech samples from the selected speakers are used to create the adaptation set. This approach leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Emotion and Mood Recognition