The IBM 2016 Speaker Recognition System
Seyed Omid Sadjadi, Sriram Ganapathy, Jason W. Pelecanos

TL;DR
This paper presents advancements in IBM's i-vector speaker recognition system, including new discriminant analysis, speaker-adapted features, and deep neural network models, leading to state-of-the-art results on NIST SRE 2010 data.
Contribution
The paper introduces a novel NDA approach, utilizes speaker- and channel-adapted features from ASR, and employs a large DNN for improved speaker recognition performance.
Findings
NDA outperforms traditional LDA with up to 35% relative EER reduction.
ASR speaker-adapted features improve recognition accuracy.
Increasing DNN output units from 2k to 10k enhances performance.
Abstract
In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and quantify their contributions. The techniques include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is formulated to alleviate some of the limitations associated with the conventional linear discriminant analysis (LDA) that assumes Gaussian class-conditional distributions, 2) the application of speaker- and channel-adapted features, which are derived from an automatic speech recognition (ASR) system, for speaker recognition, and 3) the use of a deep neural network (DNN) acoustic model with a large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
