Speech & Song Emotion Recognition Using Multilayer Perceptron and Standard Vector Machine
Behzad Javaheri

TL;DR
This study compares SVM and MLP classifiers for emotion recognition in speech and song, finding SVM slightly more accurate initially, but both perform similarly after data augmentation, with lower accuracy in speech channels.
Contribution
The paper evaluates and compares SVM and MLP for emotion recognition, optimizing features and hyperparameters, and assesses the impact of data augmentation and channel type.
Findings
SVM outperforms MLP before augmentation with 82% accuracy.
Post-augmentation, both classifiers achieve ~79% accuracy.
Both classifiers perform better on song than speech channels.
Abstract
Herein, we have compared the performance of SVM and MLP in emotion recognition using speech and song channels of the RAVDESS dataset. We have undertaken a journey to extract various audio features, identify optimal scaling strategy and hyperparameter for our models. To increase sample size, we have performed audio data augmentation and addressed data imbalance using SMOTE. Our data indicate that optimised SVM outperforms MLP with an accuracy of 82 compared to 75%. Following data augmentation, the performance of both algorithms was identical at ~79%, however, overfitting was evident for the SVM. Our final exploration indicated that the performance of both SVM and MLP were similar in which both resulted in lower accuracy for the speech channel compared to the song channel. Our findings suggest that both SVM and MLP are powerful classifiers for emotion recognition in a vocal-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSynthetic Minority Over-sampling Technique. · Support Vector Machine
