Probabilistic Back-ends for Online Speaker Recognition and Clustering
Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

TL;DR
This paper investigates scoring methods for online speaker recognition, identifies limitations of cosine scoring, and introduces a constrained PLDA-based approach that improves multi-enrollment recognition and online clustering performance.
Contribution
It proposes a simple PLDA-based scoring method for multi-enrollment recognition and an online clustering algorithm that leverages PLDA's benefits for better calibration and uncertainty handling.
Findings
PLDA-based scoring outperforms cosine scoring in multi-enrollment scenarios.
The proposed online clustering algorithm improves recognition accuracy.
The new method maintains performance in one-to-one comparisons.
Abstract
This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Text and Document Classification Technologies
