Evaluating Deepfake Detectors in the Wild

Viacheslav Pirogov; Maksim Artemev

arXiv:2507.21905·cs.CV·August 5, 2025

Evaluating Deepfake Detectors in the Wild

Viacheslav Pirogov, Maksim Artemev

PDF

1 Datasets

TL;DR

This paper evaluates the effectiveness of current deepfake detectors in real-world scenarios using a large, newly created dataset, revealing that detection remains challenging and susceptible to simple image manipulations.

Contribution

It introduces a novel testing procedure and a comprehensive dataset to assess deepfake detectors' real-world performance, highlighting their limitations.

Findings

01

Less than half of detectors scored above 60% AUC

02

Basic image manipulations significantly reduce detection performance

03

Detection remains a challenging task in real-world conditions

Abstract

Deepfakes powered by advanced machine learning models present a significant and evolving threat to identity verification and the authenticity of digital media. Although numerous detectors have been developed to address this problem, their effectiveness has yet to be tested when applied to real-world data. In this work we evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection. Using state-of-the-art deepfake generation methods, we create a comprehensive dataset containing more than 500,000 high-quality deepfake images. Our analysis shows that detecting deepfakes still remains a challenging task. The evaluation shows that in fewer than half of the deepfake detectors tested achieved an AUC score greater than 60%, with the lowest being 50%. We demonstrate that basic image manipulations, such as JPEG compression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Sumsub/Swappir
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.