ELEAT-SAGA: Early & Late Integration with Evading Alternating Training for Spoof-Robust Speaker Verification
Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot

TL;DR
This paper introduces a novel SASV architecture called SASV-SAGA that uses score-aware gated attention and alternating training strategies to improve robustness against spoofing attacks in speaker verification systems.
Contribution
The paper proposes a new SASV model with score-aware gated attention and introduces evading alternating training for better spoofing robustness.
Findings
Achieved SASV-EER of 1.22% on ASVspoof 2019 dataset.
Significant improvements over baseline methods.
Validated effectiveness of attention mechanisms and training strategies.
Abstract
Spoofing-robust automatic speaker verification (SASV) seeks to build automatic speaker verification systems that are robust against both zero-effort impostor attacks and sophisticated spoofing techniques such as voice conversion (VC) and text-to-speech (TTS). In this work, we propose a novel SASV architecture that introduces score-aware gated attention (SAGA), SASV-SAGA, enabling dynamic modulation of speaker embeddings based on countermeasure (CM) scores. By integrating speaker embeddings and CM scores from pre-trained ECAPA-TDNN and AASIST models respectively, we explore several integration strategies including early, late, and full integration. We further introduce alternating training for multi-module (ATMM) and a refined variant, evading alternating training (EAT). Experimental results on the ASVspoof 2019 Logical Access (LA) and Spoofceleb datasets demonstrate significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Speech and Audio Processing
