Evaluating raw waveforms with deep learning frameworks for speech emotion recognition
Zeynep Hilal Kilimci, Ulku Bayraktar, Ayhan Kucukmanisa

TL;DR
This paper introduces a deep learning model that directly processes raw speech audio for emotion recognition, achieving state-of-the-art accuracy across multiple datasets without traditional feature extraction.
Contribution
The study demonstrates that deep neural networks can effectively recognize speech emotions directly from raw audio, surpassing traditional feature-based methods and establishing new performance benchmarks.
Findings
CNN achieves 95.86% accuracy on TESS+RAVDESS
Deep models outperform traditional feature-based methods
State-of-the-art results across multiple datasets
Abstract
Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Music and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
