Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or   PLDA?

Qiongqiong Wang; Kong Aik Lee; Tianchi Liu

arXiv:2204.03965·eess.AS·April 12, 2022

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of cosine similarity versus PLDA scoring in large-margin speaker embeddings, showing that embeddings trained with large-margin losses often make PLDA unnecessary and favor cosine scoring for speaker verification.

Contribution

It demonstrates that large-margin softmax losses produce embeddings with high intra-speaker compactness, reducing the need for complex back-ends like PLDA and highlighting the effectiveness of cosine scoring.

Findings

01

Large-margin training reduces the need for PLDA back-end.

02

Cosine scoring outperforms PLDA in large-margin embedding scenarios.

03

Pre-processing techniques have limited impact on large-margin embeddings.

Abstract

The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification. Popular parametric back-ends include the probabilistic linear discriminant analysis (PLDA) and its variants. This paper investigates the properties of margin-based cross-entropy losses leading to such a shift and aims to find scoring back-ends best suited for speaker verification. In addition, we revisit the pre-processing techniques which have been widely used in the past and assess their effectiveness on large-margin embeddings. Experiments on the state-of-the-art ECAPA-TDNN networks trained with various large-margin softmax cross-entropy losses show a substantial increment in intra-speaker compactness making the conventional PLDA superfluous. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSoftmax