Spoofing-Aware Speaker Verification via Wavelet Prompt Tuning and Multi-Model Ensembles

Aref Farhadipour; Ming Jin; Valeriia Vyshnevetska; Xiyang Li; Elisa Pellegrino; Srikanth Madikeri

arXiv:2601.17557·eess.AS·January 27, 2026

Spoofing-Aware Speaker Verification via Wavelet Prompt Tuning and Multi-Model Ensembles

Aref Farhadipour, Ming Jin, Valeriia Vyshnevetska, Xiyang Li, Elisa Pellegrino, Srikanth Madikeri

PDF

Open Access

TL;DR

This paper presents a spoofing-aware speaker verification system combining wavelet prompt tuning and multi-model ensembles, achieving high accuracy in detecting both speaker identity and audio authenticity, with notable results on in-domain and cross-domain datasets.

Contribution

The paper introduces a cascaded framework integrating wavelet prompt-tuned countermeasures with multiple ASV models, enhancing spoof detection and speaker verification performance.

Findings

01

Achieved a Macro a-DCF of 0.2017 and an SASV EER of 2.08%.

02

System attained 0.16% EER in spoof detection on in-domain data.

03

Cross-domain generalization remains a significant challenge.

Abstract

This paper describes the UZH-CL system submitted to the SASV section of the WildSpoof 2026 challenge. The challenge focuses on the integrated defense against generative spoofing attacks by requiring the simultaneous verification of speaker identity and audio authenticity. We proposed a cascaded Spoofing-Aware Speaker Verification framework that integrates a Wavelet Prompt-Tuned XLSR-AASIST countermeasure with a multi-model ensemble. The ASV component utilizes the ResNet34, ResNet293, and WavLM-ECAPA-TDNN architectures, with Z-score normalization followed by score averaging. Trained on VoxCeleb2 and SpoofCeleb, the system obtained a Macro a-DCF of 0.2017 and a SASV EER of 2.08%. While the system achieved a 0.16% EER in spoof detection on the in-domain data, results on unseen datasets, such as the ASVspoof5, highlight the critical challenge of cross-domain generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders