Deep Learning for Single and Multi-Session i-Vector Speaker Recognition
Omid Ghahabi, Javier Hernando

TL;DR
This paper explores deep learning techniques, specifically DBN and DNN, for improved speaker recognition using i-vectors, demonstrating significant performance gains on the NIST SRE 2006 dataset.
Contribution
It introduces an impostor selection algorithm and a universal model adaptation process for deep learning-based speaker recognition, improving accuracy over baseline systems.
Findings
Impostor selection and UDBN adaptation improve DNN performance by 8-20%.
Proposed methods outperform baseline systems with up to 17% EER reduction.
Experiments conducted on single and multi-session enrollment scenarios.
Abstract
The promising performance of Deep Learning (DL) in speech recognition has motivated the use of DL in other speech technology applications such as speaker recognition. Given i-vectors as inputs, the authors proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on Deep Belief Networks (DBN) and Deep Neural Networks (DNN) to discriminatively model each target speaker. In order to have more insight into the behavior of DL techniques in both single and multi-session speaker enrollment tasks, some experiments have been carried out in this paper in both scenarios. Additionally, the parameters of the global model, referred to as universal DBN (UDBN), are normalized before adaptation. UDBN normalization facilitates training DNNs specifically with more than one hidden layer. Experiments are performed on the NIST SRE 2006 corpus. It is shown…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
