CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions
Ali Bou Nassif, Ismail Shahin, Shibani Hamsa, Nawel Nemmour, Keikichi, Hirose

TL;DR
This paper presents a novel CASA-based pre-processing and cascaded GMM-CNN classifier to improve speaker identification accuracy in noisy and emotional conditions, demonstrating promising results across multiple databases.
Contribution
It introduces a new algorithm combining CASA noise reduction with a cascaded GMM-CNN classifier for enhanced speaker identification in challenging environments.
Findings
Improved accuracy in noisy and emotional conditions.
Effective across multiple speech databases.
Outperforms other classifiers in tested scenarios.
Abstract
This work aims at intensifying text-independent speaker identification performance in real application situations such as noisy and emotional talking conditions. This is achieved by incorporating two different modules: a Computational Auditory Scene Analysis CASA based pre-processing module for noise reduction and cascaded Gaussian Mixture Model Convolutional Neural Network GMM-CNN classifier for speaker identification followed by emotion recognition. This research proposes and evaluates a novel algorithm to improve the accuracy of speaker identification in emotional and highly-noise susceptible conditions. Experiments demonstrate that the proposed model yields promising results in comparison with other classifiers when Speech Under Simulated and Actual Stress SUSAS database, Emirati Speech Database ESD, the Ryerson Audio-Visual Database of Emotional Speech and Song RAVDESS database and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
