Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype   Mining and Language-Dependent Score Normalization

Jenthe Thienpondt; Brecht Desplanques; Kris Demuynck

arXiv:2007.07689·eess.AS·November 3, 2020

Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

PDF

TL;DR

This paper presents a novel cross-lingual speaker verification method that combines domain-balanced hard prototype mining with language-dependent score normalization, achieving top performance in the SdSV Challenge 2020.

Contribution

It introduces domain-balanced hard prototype mining for effective training and a language-dependent score normalization to improve cross-lingual verification accuracy.

Findings

01

Achieved a MinDCF of 0.065 and EER of 1.45% on the SdSVC dataset.

02

Demonstrated the effectiveness of domain-balanced training and language-aware score normalization.

03

Outperformed previous methods in the SdSV Challenge 2020.

Abstract

In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to fine-tune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.