Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Marouane El Hizabri; Abdelfattah Bezzaz; Ismail Hayoukane; Youssef Taki

arXiv:2601.18908·cs.SD·January 28, 2026

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Marouane El Hizabri, Abdelfattah Bezzaz, Ismail Hayoukane, Youssef Taki

PDF

Open Access

TL;DR

This paper improves speech emotion recognition accuracy by integrating dynamic spectral features and Kalman Smoothing to reduce noise and stabilize classifier outputs, achieving state-of-the-art results.

Contribution

It introduces the combination of dynamic spectral features with Kalman Smoothing for enhanced emotion recognition in noisy conditions.

Findings

01

Achieved 87% accuracy on RAVDESS dataset.

02

Reduced misclassification between similar emotions.

03

Enhanced stability of emotion classification over time.

Abstract

Speech Emotion Recognition systems often use static features like Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). Because of this, they can misclassify emotions when there is acoustic noise in vocal signals. To address this, we added dynamic features using Dynamic Spectral features (Deltas and Delta-Deltas) along with the Kalman Smoothing algorithm. This approach reduces noise and improves emotion classification. Since emotion changes over time, the Kalman Smoothing filter also helped make the classifier outputs more stable. Tests on the RAVDESS dataset showed that this method achieved a state-of-the-art accuracy of 87\% and reduced misclassification between emotions with similar acoustic features

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech Recognition and Synthesis