Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Bj\"orn W. Schuller

TL;DR
This paper introduces a data distillation approach to create smaller, privacy-preserving datasets for speech emotion recognition, enabling efficient model training on resource-constrained IoT devices without sacrificing accuracy.
Contribution
The paper proposes a novel data distillation framework that produces a compact, synthetic dataset for SER, addressing resource limitations and privacy concerns in IoT applications.
Findings
Distilled dataset achieves comparable SER performance to full dataset.
Efficient training of SER models on IoT devices with limited resources.
Enhanced privacy preservation through data synthesis.
Abstract
Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
