Unmasking real-world audio deepfakes: A data-centric approach

David Combei; Adriana Stan; Dan Oneata; Nicolas M\"uller; Horia Cucu

arXiv:2506.09606·eess.AS·September 30, 2025·Interspeech

Unmasking real-world audio deepfakes: A data-centric approach

David Combei, Adriana Stan, Dan Oneata, Nicolas M\"uller, Horia Cucu

PDF

1 Repo

TL;DR

This paper introduces a real-world audio deepfake dataset and demonstrates that data-centric strategies like curation and augmentation significantly improve detection robustness and generalization in real-world scenarios.

Contribution

The work presents a new real-world audio deepfake dataset and emphasizes data-centric methods to enhance detection performance without increasing model complexity.

Findings

01

55% relative reduction in EER on In-the-Wild dataset

02

63% reduction in EER on AI4T dataset

03

Data-centric approaches improve real-world deepfake detection

Abstract

The growing prevalence of real-world deepfakes presents a critical challenge for existing detection systems, which are often evaluated on datasets collected just for scientific purposes. To address this gap, we introduce a novel dataset of real-world audio deepfakes. Our analysis reveals that these real-world examples pose significant challenges, even for the most performant detection models. Rather than increasing model complexity or exhaustively search for a better alternative, in this work we focus on a data-centric paradigm, employing strategies like dataset curation, pruning, and augmentation to improve model robustness and generalization. Through these methods, we achieve a 55% relative reduction in EER on the In-the-Wild dataset, reaching an absolute EER of 1.7%, and a 63% reduction on our newly proposed real-world deepfakes dataset, AI4T. These results highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidcombei/ai4t
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus