TL;DR
This paper introduces AFSS, a novel method that reduces bias in audio deepfake detectors by generating pseudo-fake samples through self-synthesis, improving generalization without relying on pre-collected fake datasets.
Contribution
AFSS is the first approach to use artifact-focused self-synthesis with speaker constraints and dynamic loss reweighting to mitigate bias in deepfake detection.
Findings
Achieves state-of-the-art average EER of 5.45% across 7 datasets.
Significantly reduces EER to 1.23% on WaveFake.
Eliminates the need for pre-collected fake datasets.
Abstract
The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
