The DeepFake Detection Challenge (DFDC) Dataset
Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes,, Menglin Wang, Cristian Canton Ferrer

TL;DR
This paper introduces the DeepFake Detection Challenge (DFDC) dataset, the largest publicly available face swap video dataset, and analyzes the challenge's top submissions, highlighting the difficulty of detecting Deepfakes and the potential of models trained on this dataset.
Contribution
The paper presents the creation of the extensive DFDC dataset and provides insights from the Kaggle competition to advance DeepFake detection research.
Findings
Deepfake detection remains a challenging and unsolved problem.
Models trained on DFDC can generalize to real-world Deepfake videos.
The dataset enables the development of more robust detection methods.
Abstract
Deepfakes are a recent off-the-shelf manipulation technique that allows anyone to swap two identities in a single video. In addition to Deepfakes, a variety of GAN-based face swapping methods have also been published with accompanying code. To counter this emerging threat, we have constructed an extremely large face swap video dataset to enable the training of detection models, and organized the accompanying DeepFake Detection Challenge (DFDC) Kaggle competition. Importantly, all recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset. The DFDC dataset is by far the largest currently and publicly available face swap video dataset, with over 100,000 total clips sourced from 3,426 paid actors, produced with several Deepfake, GAN-based, and non-learned methods. In addition to describing the methods used to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Face recognition and analysis
