Unifying Cosine and PLDA Back-ends for Speaker Verification
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan

TL;DR
This paper demonstrates that cosine scoring is a special case of PLDA scoring in speaker verification, explaining why PLDA often underperforms with neural embeddings and highlighting conditions where each method excels.
Contribution
It unifies cosine and PLDA back-ends, showing their theoretical equivalence and analyzing their performance differences under various domain conditions.
Findings
Cosine scoring is a special case of PLDA with proper parameter settings.
Dimensional independence assumption impacts performance under domain-matched conditions.
PLDA outperforms cosine in severe domain mismatch scenarios.
Abstract
State-of-art speaker verification (SV) systems use a back-end model to score the similarity of speaker embeddings extracted from a neural network model. The commonly used back-end models are the cosine scoring and the probabilistic linear discriminant analysis (PLDA) scoring. With the recently developed neural embeddings, the theoretically more appealing PLDA approach is found to have no advantage against or even be inferior the simple cosine scoring in terms of SV system performance. This paper presents an investigation on the relation between the two scoring approaches, aiming to explain the above counter-intuitive observation. It is shown that the cosine scoring is essentially a special case of PLDA scoring. In other words, by properly setting the parameters of PLDA, the two back-ends become equivalent. As a consequence, the cosine scoring not only inherits the basic assumptions for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
