Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

TL;DR
This paper evaluates the effectiveness of cosine similarity versus PLDA scoring in large-margin speaker embeddings, showing that embeddings trained with large-margin losses often make PLDA unnecessary and favor cosine scoring for speaker verification.
Contribution
It demonstrates that large-margin softmax losses produce embeddings with high intra-speaker compactness, reducing the need for complex back-ends like PLDA and highlighting the effectiveness of cosine scoring.
Findings
Large-margin training reduces the need for PLDA back-end.
Cosine scoring outperforms PLDA in large-margin embedding scenarios.
Pre-processing techniques have limited impact on large-margin embeddings.
Abstract
The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification. Popular parametric back-ends include the probabilistic linear discriminant analysis (PLDA) and its variants. This paper investigates the properties of margin-based cross-entropy losses leading to such a shift and aims to find scoring back-ends best suited for speaker verification. In addition, we revisit the pre-processing techniques which have been widely used in the past and assess their effectiveness on large-margin embeddings. Experiments on the state-of-the-art ECAPA-TDNN networks trained with various large-margin softmax cross-entropy losses show a substantial increment in intra-speaker compactness making the conventional PLDA superfluous. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsSoftmax
