The IBM Speaker Recognition System: Recent Advances and Error Analysis

Seyed Omid Sadjadi; Jason Pelecanos; Sriram Ganapathy

arXiv:1605.01635·cs.CL·May 6, 2016

The IBM Speaker Recognition System: Recent Advances and Error Analysis

Seyed Omid Sadjadi, Jason Pelecanos, Sriram Ganapathy

PDF

TL;DR

This paper details recent improvements in IBM's speaker recognition system, including advanced feature extraction and modeling techniques, achieving state-of-the-art results on NIST 2010 SRE conditions and analyzing remaining errors.

Contribution

Introduces NDA for variability compensation, speaker-adapted features from ASR, and a DNN with extensive output units for improved speaker recognition accuracy.

Findings

01

Achieved best published results on NIST 2010 SRE conditions

02

System attained an EER of 0.59% on extended tel-tel condition (C5)

03

Error analysis identified issues with low-quality recordings and their impact on performance

Abstract

We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability compensation in the i-vector space, the application of speaker and channel-adapted features derived from an automatic speech recognition (ASR) system for speaker recognition, and the use of a DNN acoustic model with a very large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as the 10sec-10sec condition. To our knowledge, results achieved by our system represent the best performances published to date on these conditions. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.