Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus   Speech Emotion Recognition

Yan Zhao; Yuan Zong; Jincen Wang; Hailun Lian; Cheng Lu; Li Zhao,; Wenming Zheng

arXiv:2310.03992·cs.SD·October 9, 2023

Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Yan Zhao, Yuan Zong, Jincen Wang, Hailun Lian, Cheng Lu, Li Zhao,, Wenming Zheng

PDF

Open Access

TL;DR

This paper introduces LIDAN, a novel layer-adapted implicit distribution alignment network that improves cross-corpus speech emotion recognition by aligning feature distributions across different speech datasets without assuming explicit distribution forms.

Contribution

LIDAN extends previous implicit distribution alignment methods by incorporating layer-specific alignment terms, enhancing emotion-discriminative and corpus-invariant feature learning for SER.

Findings

01

LIDAN outperforms recent state-of-the-art methods in cross-corpus SER tasks.

02

The layer-adapted approach effectively aligns features across diverse speech corpora.

03

The method does not rely on explicit distribution assumptions, using target sample reconstruction instead.

Abstract

In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adpated extention consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis