Biologically inspired speech emotion recognition

Reza Lotfidereshgi; Philippe Gournay

arXiv:2111.08112·eess.AS·November 17, 2021

Biologically inspired speech emotion recognition

Reza Lotfidereshgi, Philippe Gournay

PDF

TL;DR

This paper introduces a biologically-inspired speech emotion recognition method that processes raw speech signals directly using a liquid state machine, achieving high accuracy without traditional feature extraction.

Contribution

It combines the source-filter speech model with liquid state machines to directly classify emotions from speech signals, bypassing feature extraction.

Findings

01

Achieved high classification accuracy on Emo-DB

02

Demonstrated effectiveness of SNNs in speech emotion recognition

03

Proposed a novel biologically-inspired framework for speech processing

Abstract

Conventional feature-based classification methods do not apply well to automatic recognition of speech emotions, mostly because the precise set of spectral and prosodic features that is required to identify the emotional state of a speaker has not been determined yet. This paper presents a method that operates directly on the speech signal, thus avoiding the problematic step of feature extraction. Furthermore, this method combines the strengths of the classical source-filter model of human speech production with those of the recently introduced liquid state machine (LSM), a biologically-inspired spiking neural network (SNN). The source and vocal tract components of the speech signal are first separated and converted into perceptually relevant spectral representations. These representations are then processed separately by two reservoirs of neurons. The output of each reservoir is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.