Knowledge Transfer For On-Device Speech Emotion Recognition with Neural   Structured Learning

Yi Chang; Zhao Ren; Thanh Tam Nguyen; Kun Qian; Bj\"orn W. Schuller

arXiv:2210.14977·cs.SD·May 12, 2023

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Bj\"orn W. Schuller

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural structured learning framework that leverages synthesized graphs to transfer knowledge for on-device speech emotion recognition, enabling lightweight models with improved performance on edge devices.

Contribution

The paper presents a novel neural structured learning approach using synthesized graphs to enhance transfer learning for speech emotion recognition on resource-constrained edge devices.

Findings

01

Lightweight models trained with graphs outperform those trained with speech alone.

02

The proposed method improves SER accuracy compared to traditional transfer learning.

03

The framework is suitable for deployment on edge devices with limited resources.

Abstract

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

glam-imperial/nsl-ser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis