Deep Learning for Speaker Identification: Architectural Insights from   AB-1 Corpus Analysis and Performance Evaluation

Matthias Bartolo

arXiv:2408.06804·cs.SD·October 29, 2024

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

Matthias Bartolo

PDF

Open Access 1 Repo

TL;DR

This paper investigates various deep learning architectures for speaker identification, focusing on feature extraction methods like Mel Spectrogram and MFCC, and evaluates their performance and biases using the AB-1 Corpus.

Contribution

It provides a comprehensive analysis of six deep learning models for SID, including hyperparameter tuning and bias assessment on the AB-1 Corpus.

Findings

01

Identified the most effective model architecture for SID

02

Demonstrated the impact of feature extraction methods on accuracy

03

Evaluated gender, accent, and bias effects in speaker identification

Abstract

In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbar0075/speech-technology
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing