DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild

Arnab Das; Yassine El Kheir; Enes Erdem Erdogan; Feidi Kallel; Tim Polzehl; Sebastian Moeller

arXiv:2602.02286·cs.SD·February 3, 2026

DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild

Arnab Das, Yassine El Kheir, Enes Erdem Erdogan, Feidi Kallel, Tim Polzehl, Sebastian Moeller

PDF

Open Access

TL;DR

This paper introduces a robust SASV framework for the WildSpoof Challenge, combining a self-supervised spoofing detector with an advanced speaker verification system, achieving improved detection and verification performance in challenging conditions.

Contribution

The paper presents a novel SASV system integrating a self-supervised speech embedding, graph neural network, mixture-of-experts fusion, and a low-complexity CNN with advanced loss functions, enhancing spoofing detection and speaker verification.

Findings

01

Effective spoofed utterance detection using GNN and MoE fusion.

02

Improved speaker verification with SphereFace and contrastive circle loss.

03

Enhanced system robustness through score normalization and ensembling.

Abstract

This paper presents the DFKI-Speech system developed for the WildSpoof Challenge under the Spoofing aware Automatic Speaker Verification (SASV) track. We propose a robust SASV framework in which a spoofing detector and a speaker verification (SV) network operate in tandem. The spoofing detector employs a self-supervised speech embedding extractor as the frontend, combined with a state-of-the-art graph neural network backend. In addition, a top-3 layer based mixture-of-experts (MoE) is used to fuse high-level and low-level features for effective spoofed utterance detection. For speaker verification, we adapt a low-complexity convolutional neural network that fuses 2D and 1D features at multiple scales, trained with the SphereFace loss. Additionally, contrastive circle loss is applied to adaptively weight positive and negative pairs within each training batch, enabling the network to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Respiratory and Cough-Related Research