Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge
O\u{g}uzhan Kurnaz, Selim Can Demirta\c{s}, Aykut B\"uker, Jagabandhu, Mishra, Cemal Hanil\c{c}i

TL;DR
This paper presents a spoofing-robust speaker verification system that fuses embeddings from multiple models using a parallel DNN structure, significantly improving security against spoofing attacks.
Contribution
The paper introduces a novel parallel DNN architecture for embedding fusion in speaker verification, enhancing robustness and accuracy over traditional methods.
Findings
Outperforms traditional single DNN methods
Improves robustness against spoofing attacks
Achieves higher verification accuracy
Abstract
This paper introduces the parallel network-based spoofing-aware speaker verification (SASV) system developed by BTU Speech Group for the ASVspoof5 Challenge. The SASV system integrates ASV and CM systems to enhance security against spoofing attacks. Our approach employs score and embedding fusion from ASV models (ECAPA-TDNN, WavLM) and CM models (AASIST). The fused embeddings are processed using a simple DNN structure, optimizing model performance with a combination of recently proposed a-DCF and BCE losses. We introduce a novel parallel network structure where two identical DNNs, fed with different inputs, independently process embeddings and produce SASV scores. The final SASV probability is derived by averaging these scores, enhancing robustness and accuracy. Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
