Spoofing-Aware Speaker Verification via Wavelet Prompt Tuning and Multi-Model Ensembles
Aref Farhadipour, Ming Jin, Valeriia Vyshnevetska, Xiyang Li, Elisa Pellegrino, Srikanth Madikeri

TL;DR
This paper presents a spoofing-aware speaker verification system combining wavelet prompt tuning and multi-model ensembles, achieving high accuracy in detecting both speaker identity and audio authenticity, with notable results on in-domain and cross-domain datasets.
Contribution
The paper introduces a cascaded framework integrating wavelet prompt-tuned countermeasures with multiple ASV models, enhancing spoof detection and speaker verification performance.
Findings
Achieved a Macro a-DCF of 0.2017 and an SASV EER of 2.08%.
System attained 0.16% EER in spoof detection on in-domain data.
Cross-domain generalization remains a significant challenge.
Abstract
This paper describes the UZH-CL system submitted to the SASV section of the WildSpoof 2026 challenge. The challenge focuses on the integrated defense against generative spoofing attacks by requiring the simultaneous verification of speaker identity and audio authenticity. We proposed a cascaded Spoofing-Aware Speaker Verification framework that integrates a Wavelet Prompt-Tuned XLSR-AASIST countermeasure with a multi-model ensemble. The ASV component utilizes the ResNet34, ResNet293, and WavLM-ECAPA-TDNN architectures, with Z-score normalization followed by score averaging. Trained on VoxCeleb2 and SpoofCeleb, the system obtained a Macro a-DCF of 0.2017 and a SASV EER of 2.08%. While the system achieved a 0.16% EER in spoof detection on the in-domain data, results on unseen datasets, such as the ASVspoof5, highlight the critical challenge of cross-domain generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
