Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Bj\"orn W. Schuller

TL;DR
This paper introduces a neural structured learning framework that leverages synthesized graphs to transfer knowledge for on-device speech emotion recognition, enabling lightweight models with improved performance on edge devices.
Contribution
The paper presents a novel neural structured learning approach using synthesized graphs to enhance transfer learning for speech emotion recognition on resource-constrained edge devices.
Findings
Lightweight models trained with graphs outperform those trained with speech alone.
The proposed method improves SER accuracy compared to traditional transfer learning.
The framework is suitable for deployment on edge devices with limited resources.
Abstract
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis
