Can spoofing countermeasure and speaker verification systems be jointly optimised?
Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas Evans

TL;DR
This paper investigates the joint optimization of spoofing countermeasures and speaker verification systems, showing that while it can degrade individual component performance, it enhances their combined effectiveness in spoofing-aware speaker verification.
Contribution
It demonstrates that joint optimization, despite some performance trade-offs, improves the complementarity of CM and ASV systems, leading to better SASV performance with limited auxiliary data.
Findings
Joint optimization reduces EER by 27% in SASV tasks.
It improves system complementarity despite over-fitting risks.
Performance degradation of individual modules is observed.
Abstract
Spoofing countermeasure (CM) and automatic speaker verification (ASV) sub-systems can be used in tandem with a backend classifier as a solution to the spoofing aware speaker verification (SASV) task. The two sub-systems are typically trained independently to solve different tasks. While our previous work demonstrated the potential of joint optimisation, it also showed a tendency to over-fit to speakers and a lack of sub-system complementarity. Using only a modest quantity of auxiliary data collected from new speakers, we show that joint optimisation degrades the performance of separate CM and ASV sub-systems, but that it nonetheless improves complementarity, thereby delivering superior SASV performance. Using standard SASV evaluation data and protocols, joint optimisation reduces the equal error rate by 27\% relative to performance obtained using fixed, independently-optimised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsAttentive Walk-Aggregating Graph Neural Network
