PLDA-Based Diarization of Telephone Conversations
Ahmet E. Bulut, Hakan Demir, Yusuf Ziya Isik, Hakan Erdogan

TL;DR
This paper applies PLDA with variational Bayes inference and deterministic annealing to speaker diarization of telephone conversations, achieving significant error rate reduction.
Contribution
It introduces a novel PLDA-based diarization system with VB inference and DA, outperforming traditional PCA+k-means methods.
Findings
20% relative reduction in Diarization Error Rate
Effective avoidance of local optima with deterministic annealing
Superior performance on NIST SRE 2008 dataset
Abstract
This paper investigates the application of the probabilistic linear discriminant analysis (PLDA) to speaker diarization of telephone conversations. We introduce using a variational Bayes (VB) approach for inference under a PLDA model for modeling segmental i-vectors in speaker diarization. Deterministic annealing (DA) algorithm is imposed in order to avoid local optimal solutions in VB iterations. We compare our proposed system with a well-known system that applies k-means clustering on principal component analysis (PCA) coefficients of segmental i-vectors. We used summed channel telephone data from the National Institute of Standards and Technology (NIST) 2008 Speaker Recognition Evaluation (SRE) as the test set in order to evaluate the performance of the proposed system. We achieve about 20% relative improvement in Diarization Error Rate (DER) compared to the baseline system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
