Light-SERNet: A lightweight fully convolutional neural network for   speech emotion recognition

Arya Aftab; Alireza Morsali; Shahrokh Ghaemmaghami; Benoit Champagne

arXiv:2110.03435·eess.AS·October 8, 2021·ICASSP·1 cites

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Arya Aftab, Alireza Morsali, Shahrokh Ghaemmaghami, Benoit Champagne

PDF

Open Access 1 Repo

TL;DR

This paper introduces Light-SERNet, a lightweight fully convolutional neural network designed for speech emotion recognition, optimized for embedded systems with limited resources, achieving high accuracy with fewer computational demands.

Contribution

The paper presents a novel, efficient FCNN architecture with parallel feature extraction paths that outperforms larger models on standard datasets.

Findings

01

Smaller model size than state-of-the-art

02

Higher accuracy on IEMOCAP and EMO-DB datasets

03

Effective feature extraction with parallel paths

Abstract

Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves higher performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aryaaftab/light-sernet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsConvolution