The exploitation of Multiple Feature Extraction Techniques for Speaker   Identification in Emotional States under Disguised Voices

Noor Ahmad Al Hindawi; Ismail Shahin; Ali Bou Nassif

arXiv:2112.07940·cs.SD·December 16, 2021

The exploitation of Multiple Feature Extraction Techniques for Speaker Identification in Emotional States under Disguised Voices

Noor Ahmad Al Hindawi, Ismail Shahin, Ali Bou Nassif

PDF

Open Access

TL;DR

This study investigates multiple feature extraction techniques for speaker identification in emotional and disguised voices, demonstrating that concatenated MFCCs and their derivatives yield the best performance under various voice alteration effects.

Contribution

It explores and compares five feature extraction methods specifically for speaker identification in disguised and emotional voices, highlighting the effectiveness of concatenated MFCCs and derivatives.

Findings

01

Concatenated MFCCs and derivatives outperform other methods.

02

High-pitched, low-pitched, and EVC effects impact speaker identification accuracy.

03

MFCCs-based features are most effective for disguised voice identification.

Abstract

Due to improvements in artificial intelligence, speaker identification (SI) technologies have brought a great direction and are now widely used in a variety of sectors. One of the most important components of SI is feature extraction, which has a substantial impact on the SI process and performance. As a result, numerous feature extraction strategies are thoroughly investigated, contrasted, and analyzed. This article exploits five distinct feature extraction methods for speaker identification in disguised voices under emotional environments. To evaluate this work significantly, three effects are used: high-pitched, low-pitched, and Electronic Voice Conversion (EVC). Experimental results reported that the concatenated Mel-Frequency Cepstral Coefficients (MFCCs), MFCCs-delta, and MFCCs-delta-delta is the best feature extraction method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing