Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech   Emotion Recognition

Yan Zhao; Jincen Wang; Yuan Zong; Wenming Zheng; Hailun Lian; Li Zhao

arXiv:2302.08921·cs.SD·February 20, 2023

Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Yan Zhao, Jincen Wang, Yuan Zong, Wenming Zheng, Hailun Lian, Li Zhao

PDF

Open Access

TL;DR

This paper introduces DIDAN, a deep transfer learning approach that aligns speech emotion features across different corpora using implicit distribution alignment, improving cross-corpus speech emotion recognition accuracy.

Contribution

The paper presents a novel implicit distribution alignment regularization for deep transfer learning in cross-corpus speech emotion recognition, outperforming existing methods.

Findings

01

DIDAN outperforms state-of-the-art methods in cross-corpus SER tasks.

02

Implicit distribution alignment effectively reduces distribution gap.

03

DIDAN maintains emotion discriminative ability across different speech corpora.

Abstract

In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of convolutional and fully connected layers to directly regress the source speech spectrums into the emotional labels such that the proposed DIDAN can own the emotion discriminative ability. Then, such ability is transferred to be also applicable to the target speech samples regardless of corpus variance by resorting to a well-designed regularization term called implicit distribution alignment (IDA). Unlike widely-used maximum mean discrepancy (MMD) and its variants, the proposed IDA absorbs the idea…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis

MethodsALIGN