Speaker Identification from emotional and noisy speech data using   learned voice segregation and Speech VGG

Shibani Hamsa; Ismail Shahin; Youssef Iraqi; Ernesto Damiani and; Naoufel Werghi

arXiv:2210.12701·eess.AS·October 25, 2022·1 cites

Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG

Shibani Hamsa, Ismail Shahin, Youssef Iraqi, Ernesto Damiani and, Naoufel Werghi

PDF

Open Access

TL;DR

This paper introduces a novel speaker identification method that effectively segregates speech signals from emotional and noisy environments using a pre-trained DNN mask and Speech VGG, achieving high accuracy across multiple datasets.

Contribution

It presents a new approach combining a pre-trained DNN mask with Speech VGG for robust speaker identification in adverse conditions, outperforming recent methods.

Findings

01

Achieved over 85% accuracy on multiple emotional and noisy speech datasets.

02

Outperformed recent literature in speaker identification under challenging conditions.

03

Effective in both English and Arabic speech data.

Abstract

Speech signals are subjected to more acoustic interference and emotional factors than other signals. Noisy emotion-riddled speech data is a challenge for real-time speech processing applications. It is essential to find an effective way to segregate the dominant signal from other external influences. An ideal system should have the capacity to accurately recognize required auditory events from a complex scene taken in an unfavorable situation. This paper proposes a novel approach to speaker identification in unfavorable conditions such as emotion and interference using a pre-trained Deep Neural Network mask and speech VGG. The proposed model obtained superior performance over the recent literature in English and Arabic emotional speech data and reported an average speaker identification rate of 85.2\%, 87.0\%, and 86.6\% using the Ryerson audio-visual dataset (RAVDESS), speech under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsDense Connections · Softmax · Convolution · Dropout · Max Pooling