Biologically inspired speech emotion recognition
Reza Lotfidereshgi, Philippe Gournay

TL;DR
This paper introduces a biologically-inspired speech emotion recognition method that processes raw speech signals directly using a liquid state machine, achieving high accuracy without traditional feature extraction.
Contribution
It combines the source-filter speech model with liquid state machines to directly classify emotions from speech signals, bypassing feature extraction.
Findings
Achieved high classification accuracy on Emo-DB
Demonstrated effectiveness of SNNs in speech emotion recognition
Proposed a novel biologically-inspired framework for speech processing
Abstract
Conventional feature-based classification methods do not apply well to automatic recognition of speech emotions, mostly because the precise set of spectral and prosodic features that is required to identify the emotional state of a speaker has not been determined yet. This paper presents a method that operates directly on the speech signal, thus avoiding the problematic step of feature extraction. Furthermore, this method combines the strengths of the classical source-filter model of human speech production with those of the recently introduced liquid state machine (LSM), a biologically-inspired spiking neural network (SNN). The source and vocal tract components of the speech signal are first separated and converted into perceptually relevant spectral representations. These representations are then processed separately by two reservoirs of neurons. The output of each reservoir is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
