Fast Yet Effective Speech Emotion Recognition with Self-distillation
Zhao Ren, Thanh Tam Nguyen, Yi Chang, Bj\"orn W. Schuller

TL;DR
This paper introduces a self-distillation approach for speech emotion recognition that enhances accuracy and efficiency, especially on resource-limited devices, by fine-tuning a pretrained model and its shallower versions simultaneously.
Contribution
It proposes a novel self-distillation framework for SER that improves performance, reduces inference time, and enables adaptive accuracy-efficiency trade-offs on edge devices.
Findings
Outperforms existing models on SER datasets.
Enables resource-efficient inference with shallower models.
Achieves state-of-the-art accuracy with less labeled data.
Abstract
Speech emotion recognition (SER) is the task of recognising human's emotional states from speech. SER is extremely prevalent in helping dialogue systems to truly understand our emotions and become a trustworthy human conversational partner. Due to the lengthy nature of speech, SER also suffers from the lack of abundant labelled data for powerful models like deep neural networks. Pre-trained complex models on large-scale speech datasets have been successfully applied to SER via transfer learning. However, fine-tuning complex models still requires large memory space and results in low inference efficiency. In this paper, we argue achieving a fast yet effective SER is possible with self-distillation, a method of simultaneously fine-tuning a pretrained model and training shallower versions of itself. The benefits of our self-distillation framework are threefold: (1) the adoption of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
