The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC   Challenge 2024

Mohammadreza Molavi; Reza Khodadadi

arXiv:2411.16276·cs.SD·November 26, 2024

The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024

Mohammadreza Molavi, Reza Khodadadi

PDF

Open Access

TL;DR

This paper presents a high-performance text-dependent speaker verification system that combines speech content validation with advanced speaker embedding fusion, achieving top-tier results in the 2024 challenge.

Contribution

It introduces a novel pipeline integrating a Fast-Conformer ASR module with a feature fusion method for speaker embeddings, enhancing accuracy and robustness.

Findings

01

Achieved a normalized min-DCF of 0.0452, ranking 2nd in the challenge.

02

Effectively filters trials using speech content validation.

03

Demonstrates improved speaker verification performance with combined embeddings.

Abstract

This paper introduces an efficient and accurate pipeline for text-dependent speaker verification (TDSV), designed to address the need for high-performance biometric systems. The proposed system incorporates a Fast-Conformer-based ASR module to validate speech content, filtering out Target-Wrong (TW) and Impostor-Wrong (IW) trials. For speaker verification, we propose a feature fusion approach that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation. This system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2), highlighting its effectiveness in balancing accuracy and robustness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis