Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model
Adelekun Oluwademilade, Ademola Adedamola, Abiola Abdulhakeem, Akinpelu Azeezat, Eraiyetan Israel, Omotosho Oluwadunsin, Ibenye Ikechukwu, Ayuba Muhammad, Olusanya Olamide, Kamorudeen Amuda

TL;DR
This paper presents a speech emotion recognition system using MFCC features and an LSTM neural network, achieving high accuracy and demonstrating the effectiveness of deep learning for emotion detection in speech.
Contribution
The study introduces an LSTM-based deep learning model for SER that outperforms traditional SVM classifiers on the TESS dataset.
Findings
LSTM model achieved 99% accuracy in emotion classification.
MFCC features effectively capture emotional cues in speech.
LSTM outperforms SVM baseline in SER tasks.
Abstract
Speech Emotion Recognition (SER) is the use of machines to detect the emotional state of humans based on the speech, which is gaining importance in natural human-computer interaction. Speech is a very valuable source of information, as emotions modify the patterns of speech; pitch, energy and even timing. Nonetheless, SER is not an easy task because speakers are not constant, and situations vary when recording and the sound similarity between specific feelings. In this work, the author introduces a speech emotion recognition system relying on the Mel-Frequency Cepstral Coefficient and Long Short-Term Memory (LSTM) neural network, as a feature extraction method. The Toronto Emotional Speech Set (TESS) speech signal was pre-processed, and transformed into MFCC features to understand the important aspects in terms of time. The resultant features were then introduced to LSTM model, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
