Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion:   BTU Speech Group's Approach for ASVspoof5 Challenge

O\u{g}uzhan Kurnaz; Selim Can Demirta\c{s}; Aykut B\"uker; Jagabandhu; Mishra; Cemal Hanil\c{c}i

arXiv:2408.15877·eess.AS·November 4, 2024

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

O\u{g}uzhan Kurnaz, Selim Can Demirta\c{s}, Aykut B\"uker, Jagabandhu, Mishra, Cemal Hanil\c{c}i

PDF

Open Access

TL;DR

This paper presents a spoofing-robust speaker verification system that fuses embeddings from multiple models using a parallel DNN structure, significantly improving security against spoofing attacks.

Contribution

The paper introduces a novel parallel DNN architecture for embedding fusion in speaker verification, enhancing robustness and accuracy over traditional methods.

Findings

01

Outperforms traditional single DNN methods

02

Improves robustness against spoofing attacks

03

Achieves higher verification accuracy

Abstract

This paper introduces the parallel network-based spoofing-aware speaker verification (SASV) system developed by BTU Speech Group for the ASVspoof5 Challenge. The SASV system integrates ASV and CM systems to enhance security against spoofing attacks. Our approach employs score and embedding fusion from ASV models (ECAPA-TDNN, WavLM) and CM models (AASIST). The fused embeddings are processed using a simple DNN structure, optimizing model performance with a combination of recently proposed a-DCF and BCE losses. We introduce a novel parallel network structure where two identical DNNs, fed with different inputs, independently process embeddings and produce SASV scores. The final SASV probability is derived by averaging these scores, enhancing robustness and accuracy. Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing