The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024
Mohammadreza Molavi, Reza Khodadadi

TL;DR
This paper presents a high-performance text-dependent speaker verification system that combines speech content validation with advanced speaker embedding fusion, achieving top-tier results in the 2024 challenge.
Contribution
It introduces a novel pipeline integrating a Fast-Conformer ASR module with a feature fusion method for speaker embeddings, enhancing accuracy and robustness.
Findings
Achieved a normalized min-DCF of 0.0452, ranking 2nd in the challenge.
Effectively filters trials using speech content validation.
Demonstrates improved speaker verification performance with combined embeddings.
Abstract
This paper introduces an efficient and accurate pipeline for text-dependent speaker verification (TDSV), designed to address the need for high-performance biometric systems. The proposed system incorporates a Fast-Conformer-based ASR module to validate speech content, filtering out Target-Wrong (TW) and Impostor-Wrong (IW) trials. For speaker verification, we propose a feature fusion approach that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation. This system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2), highlighting its effectiveness in balancing accuracy and robustness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
