Robust Text-Dependent Speaker Verification via Character-Level   Information Preservation for the SdSV Challenge 2020

Sung Hwan Mun; Woo Hyun Kang; Min Hyun Han; Nam Soo Kim

arXiv:2010.11408·eess.AS·October 23, 2020

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

PDF

TL;DR

This paper presents a robust text-dependent speaker verification system that preserves character-level information using novel pooling and score compensation methods based on CTC-ASR, achieving state-of-the-art results in the SdSV Challenge 2020.

Contribution

It introduces new pooling and score compensation techniques leveraging CTC-based ASR to enhance phrase-dependent information in speaker verification embeddings.

Findings

01

Improved verification performance with 0.0785% MinDCF and 2.23% EER.

02

Fusion of multiple systems yields best results.

03

Proposed methods outperform conventional pooling techniques.

Abstract

This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods (e.g., statistical, self-attentive, ghostVLAD pooling). Although the conventional pooling methods provide embeddings with a sufficient amount of speaker-dependent information, our experiments show that these embeddings often lack phrase-dependent information. To mitigate this problem, we propose a new pooling and score compensation methods that leverage a CTC-based automatic speech recognition (ASR) model for taking the lexical content into account. Both methods showed improvement over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.