Adversarial Auto-encoders for Speech Based Emotion Recognition
Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol, Espy-Wilson

TL;DR
This paper explores the use of adversarial autoencoders for speech emotion recognition, focusing on compressing emotional features and generating synthetic samples to improve classifier training.
Contribution
It demonstrates the effectiveness of adversarial autoencoders in encoding emotional speech features and generating synthetic data for emotion recognition tasks.
Findings
Effective encoding of high-dimensional emotional features with minimal discriminability loss
Successful generation of synthetic emotional speech samples
Potential for improved emotion recognition classifier training
Abstract
Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition. They map the autoencoder's bottleneck layer output (termed as code vectors) to different noise Probability Distribution Functions (PDFs), that can be further regularized to cluster based on class information. In addition, they also allow a generation of synthetic samples by sampling the code vectors from the mapped PDFs. Inspired by these properties, we investigate the application of adversarial autoencoders to the domain of emotion recognition. Specifically, we conduct experiments on the following two aspects: (i) their ability to encode high dimensional feature vector representations for emotional utterances into a compressed space (with a minimal loss of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
