DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection

Marcel Klemt; Carlotta Segna; Anna Rohrbach

arXiv:2506.05851·cs.MM·June 9, 2025

DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection

Marcel Klemt, Carlotta Segna, Anna Rohrbach

PDF

Open Access

TL;DR

This paper critically examines the challenges in benchmarking audio-video DeepFake detection, introduces a new dataset and evaluation protocol, and proposes a simple yet effective baseline model to improve detection robustness.

Contribution

It is the first to propose an evaluation protocol and benchmark using SOTA models, and introduces SIMBA, a minimalistic multimodal detection approach, addressing key dataset and evaluation issues.

Findings

01

Identified critical issues with existing datasets like FakeAVCeleb.

02

Proposed a new evaluation protocol for better benchmarking.

03

Introduced SIMBA, a competitive baseline for multimodal DeepFake detection.

Abstract

Generative AI advances rapidly, allowing the creation of very realistic manipulated video and audio. This progress presents a significant security and ethical threat, as malicious users can exploit DeepFake techniques to spread misinformation. Recent DeepFake detection approaches explore the multimodal (audio-video) threat scenario. In particular, there is a lack of reproducibility and critical issues with existing datasets - such as the recently uncovered silence shortcut in the widely used FakeAVCeleb dataset. Considering the importance of this topic, we aim to gain a deeper understanding of the key issues affecting benchmarking in audio-video DeepFake detection. We examine these challenges through the lens of the three core benchmarking pillars: datasets, detection methods, and evaluation protocols. To address these issues, we spotlight the recent DeepSpeak v1 dataset and are the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection