Novel Hybrid DNN Approaches for Speaker Verification in Emotional and   Stressful Talking Environments

Ismail Shahin; Ali Bou Nassif; Nawel Nemmour; Ashraf Elnagar; Adi; Alhudhaif; Kemal Polat

arXiv:2112.13353·cs.SD·December 28, 2021

Novel Hybrid DNN Approaches for Speaker Verification in Emotional and Stressful Talking Environments

Ismail Shahin, Ali Bou Nassif, Nawel Nemmour, Ashraf Elnagar, Adi, Alhudhaif, Kemal Polat

PDF

TL;DR

This paper compares hybrid deep neural network models for speaker verification in emotional and stressful environments, finding HMM-DNN to be most effective but computationally intensive, with performance varying across datasets.

Contribution

Introduces and empirically evaluates four novel hybrid DNN-based models for speaker verification in challenging emotional and stressful talking environments.

Findings

01

HMM-DNN outperforms other hybrid models in EER and AUC metrics.

02

DNN-GMM has the lowest computational complexity.

03

Performance varies depending on the dataset used.

Abstract

In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments. This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers. Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), Gaussian mixture model-deep neural network (GMM-DNN), and hidden Markov model-deep neural network (HMM-DNN). All models were based on novel implemented architecture. The comparative study used three distinct speech datasets: a private Arabic dataset and two public English databases, namely, Speech Under Simulated and Actual Stress (SUSAS) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The test results of the aforementioned hybrid models demonstrated that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.