LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge
Jinyoung Park, Won Jang, Jiwoong Park

TL;DR
This paper presents a spoof-aware speaker verification system for high-quality TTS attacks, utilizing multi-speaker IDs and discriminator-based sub-judges to improve detection accuracy in the WildSpoof Challenge.
Contribution
The paper introduces dual and multi-speaker ID strategies and discriminator-based sub-judges using internal GAN features for enhanced spoof detection.
Findings
Improved detection cost function (a-DCF) on SpoofCeleb corpus.
Effective use of discriminator features for spoof detection.
Enhanced speaker verification robustness against TTS attacks.
Abstract
This paper describes our submission to the WildSpoof Challenge Track 2, which focuses on spoof-aware speaker verification (SASV) in the presence of high-quality text-to-speech (TTS) attacks. We adopt a ResNet-221 back-bone and study two speaker-labeling strategies, namelyDual-Speaker IDs and Multi-Speaker IDs, to explicitly enlarge the margin between bona fide and generated speech in the embedding space. In addition, we propose discriminator-based sub-judge systems that reuse internal features from HiFi-GAN and BigVGAN discriminators, aggregated via multi-query multi-head attentive statistics pooling(MQMHA). Experimental results on the SpoofCeleb corpus show that our system design is effective in improving agnostic detection cost function (a-DCF).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Authorship Attribution and Profiling
