Feature Selection Enhancement and Feature Space Visualization for   Speech-Based Emotion Recognition

Sofia Kanwal; Sohail Asghar; Hazrat Ali

arXiv:2208.09269·eess.SP·August 22, 2022

Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition

Sofia Kanwal, Sohail Asghar, Hazrat Ali

PDF

Open Access

TL;DR

This paper introduces a feature enhancement and visualization method for speech emotion recognition that improves accuracy by applying PCA and t-SNE to selected feature subsets, validated on two multilingual datasets.

Contribution

The study presents a novel feature enhancement strategy combining PCA and feature fusion, along with visualization, to improve speech emotion recognition accuracy.

Findings

01

Achieved 11.5% average recognition gain on EMO-DB dataset.

02

Achieved 13.8% average recognition gain on RAVDESS dataset.

03

Validated effectiveness across German and English emotional speech datasets.

Abstract

Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5\% for six out of seven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing