Baseline Systems for the First Spoofing-Aware Speaker Verification   Challenge: Score and Embedding Fusion

Hye-jin Shim; Hemlata Tak; Xuechen Liu; Hee-Soo Heo; Jee-weon Jung,; Joon Son Chung; Soo-Whan Chung; Ha-Jin Yu; Bong-Jin Lee; Massimiliano; Todisco; H\'ector Delgado; Kong Aik Lee; Md Sahidullah; Tomi Kinnunen,; Nicholas Evans

arXiv:2204.09976·cs.SD·April 22, 2022·1 cites

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion

Hye-jin Shim, Hemlata Tak, Xuechen Liu, Hee-Soo Heo, Jee-weon Jung,, Joon Son Chung, Soo-Whan Chung, Ha-Jin Yu, Bong-Jin Lee, Massimiliano, Todisco, H\'ector Delgado, Kong Aik Lee, Md Sahidullah, Tomi Kinnunen,, Nicholas Evans

PDF

Open Access

TL;DR

This paper evaluates the integration of speaker verification and spoofing countermeasures, demonstrating that fusion strategies significantly improve system robustness against spoofing attacks, and introduces a new challenge for benchmarking such integrated solutions.

Contribution

It presents a comprehensive analysis of score and embedding fusion methods for spoofing-aware speaker verification and introduces the SASV challenge for standardized benchmarking.

Findings

01

Fusion strategies reduce EER from 23.83% to below 7%.

02

Integrated systems outperform standalone solutions significantly.

03

The SASV challenge promotes development of more robust speaker verification methods.

Abstract

Deep learning has brought impressive progress in the study of both automatic speaker verification (ASV) and spoofing countermeasures (CM). Although solutions are mutually dependent, they have typically evolved as standalone sub-systems whereby CM solutions are usually designed for a fixed ASV system. The work reported in this paper aims to gauge the improvements in reliability that can be gained from their closer integration. Results derived using the popular ASVspoof2019 dataset indicate that the equal error rate (EER) of a state-of-the-art ASV system degrades from 1.63% to 23.83% when the evaluation protocol is extended with spoofed trials.%subjected to spoofing attacks. However, even the straightforward integration of ASV and CM systems in the form of score-sum and deep neural network-based fusion strategies reduce the EER to 1.71% and 6.37%, respectively. The new Spoofing-Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders