Dataset-Distillation Generative Model for Speech Emotion Recognition

Fabian Ritter-Gutierrez; Kuan-Po Huang; Jeremy H.M Wong; Dianwen Ng,; Hung-yi Lee; Nancy F. Chen; Eng Siong Chng

arXiv:2406.02963·cs.SD·June 6, 2024

Dataset-Distillation Generative Model for Speech Emotion Recognition

Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H.M Wong, Dianwen Ng,, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

PDF

Open Access

TL;DR

This paper introduces a novel dataset distillation method using GANs for speech emotion recognition, reducing data size and training time while maintaining or improving performance and enhancing privacy.

Contribution

First application of dataset distillation to speech emotion recognition using GANs, enabling smaller datasets and faster training with maintained accuracy.

Findings

01

Achieves comparable performance with original data, even with class imbalance.

02

Improves accuracy by 0.3% UAR with balanced classes.

03

Reduces dataset storage and accelerates training by 95%.

Abstract

Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Emotion Recognition on IEMOCAP. We employ Generative Adversarial Networks (GANs) not to mimic real data but to distil key discriminative information of IEMOCAP that is useful for downstream training. The GAN then replaces the original dataset and can sample custom synthetic dataset sizes. It performs comparably when following the original class imbalance but improves performance by 0.3% absolute UAR with balanced classes. It also reduces dataset storage and accelerates downstream training by 95%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques