Supervised Contrastive Learning with Nearest Neighbor Search for Speech   Emotion Recognition

Xuechen Wang; Shiwan Zhao; Yong Qin

arXiv:2308.16485·eess.AS·September 1, 2023

Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition

Xuechen Wang, Shiwan Zhao, Yong Qin

PDF

TL;DR

This paper enhances Speech Emotion Recognition by integrating supervised contrastive learning with nearest neighbor search, utilizing pre-trained models and novel loss functions to improve discriminative ability and boundary clarity.

Contribution

It introduces a combined loss function and an inference interpolation method that leverage nearest neighbor search, advancing SER performance with limited data.

Findings

01

Outperforms state-of-the-art on IEMOCAP dataset

02

Improves inter-class separation and intra-class compactness

03

Enhances model robustness with limited data

Abstract

Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of certain emotions. In this paper, we present a comprehensive approach to improve the SER performance throughout the model lifecycle, including pre-training, fine-tuning, and inference stages. To address the data scarcity issue, we utilize a pre-trained model, wav2vec2.0. During fine-tuning, we propose a novel loss function that combines cross-entropy loss with supervised contrastive learning loss to improve the model's discriminative ability. This approach increases the inter-class distances and decreases the intra-class distances, mitigating the issue of blurred boundaries. Finally, to leverage the improved distances, we propose an interpolation method at the inference stage that combines the model prediction with the output from a k-nearest neighbors model. Our experiments on IEMOCAP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning