On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson

TL;DR
This paper explores using GANs to generate synthetic speech emotion features to improve classifier performance, comparing vanilla and conditional GANs across multiple datasets.
Contribution
It introduces a novel application of GANs for augmenting speech emotion recognition data with synthetic features, enhancing classifier accuracy.
Findings
Synthetic data improves classifier performance.
Conditional GANs outperform vanilla GANs.
Cross-corpus validation confirms robustness.
Abstract
Generative Adversarial Networks (GANs) have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution). Once trained, they allow synthetic generation of examples sampled from the target distribution. We investigate the application of GANs to generate synthetic feature vectors used for speech emotion recognition. Specifically, we investigate two set ups: (i) a vanilla GAN that learns the distribution of a lower dimensional representation of the actual higher dimensional feature vector and, (ii) a conditional GAN that learns the distribution of the higher dimensional feature vectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
