Domain Adversarial for Acoustic Emotion Recognition
Mohammed Abdelwahab, Carlos Busso

TL;DR
This paper proposes an adversarial multitask training approach using a gradient reversal layer to improve speech emotion recognition across different data domains by learning domain-invariant features, leveraging unlabeled data.
Contribution
It introduces a novel adversarial training method with a gradient reversal layer to extract shared representations for emotion recognition across domains, enhancing performance with unlabeled data.
Findings
Unsupervised domain adaptation improves emotion recognition accuracy.
Deeper neural networks facilitate better domain-invariant feature learning.
Adversarial training aligns source and target domain representations.
Abstract
The performance of speech emotion recognition is affected by the differences in data distributions between train (source domain) and test (target domain) sets used to build and evaluate the models. This is a common problem, as multiple studies have shown that the performance of emotional classifiers drop when they are exposed to data that does not match the distribution used to build the emotion classifiers. The difference in data distributions becomes very clear when the training and testing data come from different domains, causing a large performance gap between validation and testing performance. Due to the high cost of annotating new data and the abundance of unlabeled data, it is crucial to extract as much useful information as possible from the available unlabeled data. This study looks into the use of adversarial multitask training to extract a common representation between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
