A Data-Centric Approach to Generalizable Speech Deepfake Detection

Wen Huang; Yuchen Mao; Yanmin Qian

arXiv:2512.18210·cs.SD·December 30, 2025

A Data-Centric Approach to Generalizable Speech Deepfake Detection

Wen Huang, Yuchen Mao, Yanmin Qian

PDF

Open Access

TL;DR

This paper emphasizes the importance of data composition in speech deepfake detection, introducing a data-centric framework with empirical analysis and a novel sampling strategy that enhances model robustness and efficiency.

Contribution

It presents a comprehensive data-centric approach, including data scaling laws and the DOSS framework, to improve generalization in speech deepfake detection.

Findings

01

DOSS-Select outperforms naive data aggregation with only 3% data usage.

02

Training on 12k-hour data with DOSS-Weight achieves state-of-the-art results.

03

Data composition significantly impacts deepfake detection performance.

Abstract

Achieving robust generalization in speech deepfake detection (SDD) remains a primary challenge, as models often fail to detect unseen forgery methods. While research has focused on model-centric and algorithm-centric solutions, the impact of data composition is often underexplored. This paper proposes a data-centric approach, analyzing the SDD data landscape from two practical perspectives: constructing a single dataset and aggregating multiple datasets. To address the first perspective, we conduct a large-scale empirical study to characterize the data scaling laws for SDD, quantifying the impact of source and generator diversity. To address the second, we propose the Diversity-Optimized Sampling Strategy (DOSS), a principled framework for mixing heterogeneous data with two implementations: DOSS-Select (pruning) and DOSS-Weight (re-weighting). Our experiments show that DOSS-Select…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection