TL;DR
EMO-DNA introduces a novel unsupervised domain adaptation framework for cross-corpus speech emotion recognition, utilizing contrastive emotion decoupling and dual-level alignment to improve generalization across different speech datasets.
Contribution
The paper proposes EMO-DNA, a new UDA method that decouples emotion features from corpus-specific features and aligns them at multiple levels for better cross-corpus SER performance.
Findings
Outperforms state-of-the-art methods in cross-corpus scenarios
Effective emotion feature decoupling enhances class discrimination
Dual-level alignment improves model generalization
Abstract
Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, but unfortunately, the resulting features are mixed with corpus-specific features or not class-discriminative. To tackle these challenges, we propose a novel Emotion Decoupling aNd Alignment learning framework (EMO-DNA) for cross-corpus SER, a novel UDA method to learn emotion-relevant corpus-invariant features. The novelties of EMO-DNA are two-fold: contrastive emotion decoupling and dual-level emotion alignment. On one hand, our contrastive emotion decoupling achieves decoupling learning via a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
