Probabilistic Back-ends for Online Speaker Recognition and Clustering

Alexey Sholokhov; Nikita Kuzmin; Kong Aik Lee; Eng Siong Chng

arXiv:2302.09523·eess.AS·February 21, 2023

Probabilistic Back-ends for Online Speaker Recognition and Clustering

Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

PDF

Open Access 1 Repo

TL;DR

This paper investigates scoring methods for online speaker recognition, identifies limitations of cosine scoring, and introduces a constrained PLDA-based approach that improves multi-enrollment recognition and online clustering performance.

Contribution

It proposes a simple PLDA-based scoring method for multi-enrollment recognition and an online clustering algorithm that leverages PLDA's benefits for better calibration and uncertainty handling.

Findings

01

PLDA-based scoring outperforms cosine scoring in multi-enrollment scenarios.

02

The proposed online clustering algorithm improves recognition accuracy.

03

The new method maintains performance in one-to-one comparisons.

Abstract

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sholokhovalexey/online-speaker-clustering
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Text and Document Classification Technologies