Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition

Haoqin Sun; Shiwan Zhao; Xiangyu Kong; Xuechen Wang; Hui Wang; Jiaming; Zhou; Yong Qin

arXiv:2408.00325·cs.SD·August 2, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition

Haoqin Sun, Shiwan Zhao, Xiangyu Kong, Xuechen Wang, Hui Wang, Jiaming, Zhou, Yong Qin

PDF

Open Access

TL;DR

This paper introduces an iterative prototype refinement framework for speech emotion recognition that effectively models emotion ambiguity, improving representation quality and outperforming existing methods on the IEMOCAP dataset.

Contribution

We propose a novel iterative prototype refinement framework combining contrastive learning and dynamic prototype updating for ambiguous speech emotion recognition.

Findings

01

Superior performance on IEMOCAP dataset

02

Effective modeling of emotion ambiguity

03

Enhanced representation quality

Abstract

Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Therefore, modeling the inherent ambiguity of emotions is an urgent problem. In this paper, we propose an iterative prototype refinement framework (IPR) for ambiguous SER. IPR comprises two interlinked components: contrastive learning and class prototypes. The former provides an efficient way to obtain high-quality representations of ambiguous samples. The latter are dynamically updated based on ambiguous labels -- the similarity of the ambiguous data to all prototypes. These refined embeddings yield precise pseudo labels, thus reinforcing representation quality. Experimental evaluations conducted on the IEMOCAP dataset validate the superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing