A Data-Centric Approach to Generalizable Speech Deepfake Detection
Wen Huang, Yuchen Mao, Yanmin Qian

TL;DR
This paper emphasizes the importance of data composition in speech deepfake detection, introducing a data-centric framework with empirical analysis and a novel sampling strategy that enhances model robustness and efficiency.
Contribution
It presents a comprehensive data-centric approach, including data scaling laws and the DOSS framework, to improve generalization in speech deepfake detection.
Findings
DOSS-Select outperforms naive data aggregation with only 3% data usage.
Training on 12k-hour data with DOSS-Weight achieves state-of-the-art results.
Data composition significantly impacts deepfake detection performance.
Abstract
Achieving robust generalization in speech deepfake detection (SDD) remains a primary challenge, as models often fail to detect unseen forgery methods. While research has focused on model-centric and algorithm-centric solutions, the impact of data composition is often underexplored. This paper proposes a data-centric approach, analyzing the SDD data landscape from two practical perspectives: constructing a single dataset and aggregating multiple datasets. To address the first perspective, we conduct a large-scale empirical study to characterize the data scaling laws for SDD, quantifying the impact of source and generator diversity. To address the second, we propose the Diversity-Optimized Sampling Strategy (DOSS), a principled framework for mixing heterogeneous data with two implementations: DOSS-Select (pruning) and DOSS-Weight (re-weighting). Our experiments show that DOSS-Select…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection
