Fast Yet Effective Speech Emotion Recognition with Self-distillation

Zhao Ren; Thanh Tam Nguyen; Yi Chang; Bj\"orn W. Schuller

arXiv:2210.14636·cs.SD·October 27, 2022·1 cites

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Zhao Ren, Thanh Tam Nguyen, Yi Chang, Bj\"orn W. Schuller

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-distillation approach for speech emotion recognition that enhances accuracy and efficiency, especially on resource-limited devices, by fine-tuning a pretrained model and its shallower versions simultaneously.

Contribution

It proposes a novel self-distillation framework for SER that improves performance, reduces inference time, and enables adaptive accuracy-efficiency trade-offs on edge devices.

Findings

01

Outperforms existing models on SER datasets.

02

Enables resource-efficient inference with shallower models.

03

Achieves state-of-the-art accuracy with less labeled data.

Abstract

Speech emotion recognition (SER) is the task of recognising human's emotional states from speech. SER is extremely prevalent in helping dialogue systems to truly understand our emotions and become a trustworthy human conversational partner. Due to the lengthy nature of speech, SER also suffers from the lack of abundant labelled data for powerful models like deep neural networks. Pre-trained complex models on large-scale speech datasets have been successfully applied to SER via transfer learning. However, fine-tuning complex models still requires large memory space and results in low inference efficiency. In this paper, we argue achieving a fast yet effective SER is possible with self-distillation, a method of simultaneously fine-tuning a pretrained model and training shallower versions of itself. The benefits of our self-distillation framework are threefold: (1) the adoption of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leibniz-future-lab/selfdistill-ser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing