Improving Speaker-independent Speech Emotion Recognition Using Dynamic   Joint Distribution Adaptation

Cheng Lu; Yuan Zong; Hailun Lian; Yan Zhao; Bj\"orn Schuller; and; Wenming Zheng

arXiv:2401.09752·cs.SD·January 19, 2024·1 cites

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Bj\"orn Schuller, and, Wenming Zheng

PDF

Open Access

TL;DR

This paper introduces a Dynamic Joint Distribution Adaptation method to improve speaker-independent speech emotion recognition by reducing speaker bias and better handling multi-domain distribution shifts, resulting in superior performance.

Contribution

The paper proposes a novel DJDA approach that dynamically balances marginal and conditional distribution adaptation for speaker-invariant emotion recognition.

Findings

01

DJDA outperforms state-of-the-art methods in experiments.

02

Effective reduction of speaker bias in emotion features.

03

Dynamic balancing improves adaptation to new speakers.

Abstract

In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers. Consequently, when the trained model is confronted with data from new speakers, its performance tends to degrade. To address the issue, we propose a Dynamic Joint Distribution Adaptation (DJDA) method under the framework of multi-source domain adaptation. DJDA firstly utilizes joint distribution adaptation (JDA), involving marginal distribution adaptation (MDA) and conditional distribution adaptation (CDA), to more precisely measure the multi-domain distribution shifts caused by different speakers. This helps eliminate speaker bias in emotion features, allowing for learning discriminative and speaker-invariant speech emotion features from coarse-level to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing