Neural Network Based Speaker Classification and Verification Systems with Enhanced Features
Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Ram Sundaram, Aravind, Ganapathiraju

TL;DR
This paper introduces a neural network framework with enhanced features for speaker recognition, achieving high accuracy and low error rates through optimized features, training, and score normalization techniques.
Contribution
It presents a novel neural network-based speaker recognition system with optimized features, training methods, and normalization techniques that improve performance over previous approaches.
Findings
Achieved 100% classification rate on TIMIT dataset.
Less than 6% Equal Error Rate in speaker verification.
Enhanced features and normalization significantly improve system performance.
Abstract
This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100% classification rate in classification and less than 6% Equal Error Rate (ERR), using merely about 1 second and 5 seconds of data respectively. Features with stricter Voice Active Detection (VAD) than the regular one for speech recognition ensure extracting stronger voiced portion for speaker recognition, speaker-level mean and variance normalization helps to eliminate the discrepancy between samples from the same speaker. Both are proven to improve the system performance. In building the neural network speaker classifier, the network structure parameters are optimized with grid search and dynamically reduced regularization parameters are used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
