The IBM Speaker Recognition System: Recent Advances and Error Analysis
Seyed Omid Sadjadi, Jason Pelecanos, Sriram Ganapathy

TL;DR
This paper details recent improvements in IBM's speaker recognition system, including advanced feature extraction and modeling techniques, achieving state-of-the-art results on NIST 2010 SRE conditions and analyzing remaining errors.
Contribution
Introduces NDA for variability compensation, speaker-adapted features from ASR, and a DNN with extensive output units for improved speaker recognition accuracy.
Findings
Achieved best published results on NIST 2010 SRE conditions
System attained an EER of 0.59% on extended tel-tel condition (C5)
Error analysis identified issues with low-quality recordings and their impact on performance
Abstract
We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability compensation in the i-vector space, the application of speaker and channel-adapted features derived from an automatic speech recognition (ASR) system for speaker recognition, and the use of a DNN acoustic model with a very large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as the 10sec-10sec condition. To our knowledge, results achieved by our system represent the best performances published to date on these conditions. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
