Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
Matthias Bartolo

TL;DR
This paper investigates various deep learning architectures for speaker identification, focusing on feature extraction methods like Mel Spectrogram and MFCC, and evaluates their performance and biases using the AB-1 Corpus.
Contribution
It provides a comprehensive analysis of six deep learning models for SID, including hyperparameter tuning and bias assessment on the AB-1 Corpus.
Findings
Identified the most effective model architecture for SID
Demonstrated the impact of feature extraction methods on accuracy
Evaluated gender, accent, and bias effects in speaker identification
Abstract
In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
