BUT Systems for WildSpoof Challenge: SASV in the Wild
Junyi Peng, Jin Li, Johan Rohdin, Lin Zhang, Miroslav Hlav\'a\v{c}ek, Oldrich Plchot

TL;DR
This paper introduces a robust SASV framework for the WildSpoof Challenge, combining diverse self-supervised audio models, a novel feature augmentation strategy, and fusion techniques to improve spoofing detection in varied environments.
Contribution
We propose a novel SASV system integrating multiple self-supervised front-ends, a distribution uncertainty-based augmentation, and fusion methods to enhance spoofing robustness.
Findings
Achieved lower a-DCFs and EERs compared to baseline systems.
Demonstrated effectiveness of distribution uncertainty augmentation.
Improved robustness against unseen neural vocoders and recording conditions.
Abstract
This paper presents the BUT submission to the WildSpoof Challenge, focusing on the Spoofing-robust Automatic Speaker Verification (SASV) track. We propose a SASV framework designed to bridge the gap between general audio understanding and specialized speech analysis. Our subsystem integrates diverse Self-Supervised Learning front-ends ranging from general audio models (e.g., Dasheng) to speech-specific encoders (e.g., WavLM). These representations are aggregated via a lightweight Multi-Head Factorized Attention back-end for corresponding subtasks. Furthermore, we introduce a feature domain augmentation strategy based on Distribution Uncertainty to explicitly model and mitigate the domain shift caused by unseen neural vocoders and recording environments. By fusing these robust CM scores with state-of-the-art ASV systems, our approach achieves superior minimization of the a-DCFs and EERs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
