DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Marcel Klemt, Carlotta Segna, Anna Rohrbach

TL;DR
This paper critically examines the challenges in benchmarking audio-video DeepFake detection, introduces a new dataset and evaluation protocol, and proposes a simple yet effective baseline model to improve detection robustness.
Contribution
It is the first to propose an evaluation protocol and benchmark using SOTA models, and introduces SIMBA, a minimalistic multimodal detection approach, addressing key dataset and evaluation issues.
Findings
Identified critical issues with existing datasets like FakeAVCeleb.
Proposed a new evaluation protocol for better benchmarking.
Introduced SIMBA, a competitive baseline for multimodal DeepFake detection.
Abstract
Generative AI advances rapidly, allowing the creation of very realistic manipulated video and audio. This progress presents a significant security and ethical threat, as malicious users can exploit DeepFake techniques to spread misinformation. Recent DeepFake detection approaches explore the multimodal (audio-video) threat scenario. In particular, there is a lack of reproducibility and critical issues with existing datasets - such as the recently uncovered silence shortcut in the widely used FakeAVCeleb dataset. Considering the importance of this topic, we aim to gain a deeper understanding of the key issues affecting benchmarking in audio-video DeepFake detection. We examine these challenges through the lens of the three core benchmarking pillars: datasets, detection methods, and evaluation protocols. To address these issues, we spotlight the recent DeepSpeak v1 dataset and are the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection
