A Large-scale Universal Evaluation Benchmark For Face Forgery Detection
Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie, Song, Mingli Song, Zunlei Feng

TL;DR
This paper introduces DeepFaceGen, a large-scale benchmark dataset with nearly 1.55 million face images and videos, designed to evaluate and improve face forgery detection methods across diverse conditions.
Contribution
The creation of DeepFaceGen, a comprehensive and diverse benchmark dataset for face forgery detection, and its use to evaluate 13 detection techniques, advancing research in this field.
Findings
Detection performance varies significantly across methods.
Content diversity and ethnicity fairness impact detection accuracy.
Insights suggest directions for improving face forgery detection.
Abstract
With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important…
Peer Reviews
Decision·Submitted to ICLR 2025
This paper is easy-to-follow. This work discusses large-scale deepfakes generated by a variety of methods, including recent prompt-guided deepfakes, which is a limitation of prior datasets. Since deepfake media are rapidly evolving, a reliable and cutting-edge benchmark has foreseeable value in the field of deepfake forensics.
While the authors present up to 13 findings, these findings are not impressive enough. There are some shortcomings in the claimed findings including: * Some findings seem to simply reassert well-known motivations from previous works, such as Finding 1 with MAT(Zhao et al, CVPR 2021). * Some findings are common sense, such as Finding 2&7. Furthermore, it is recommended that authors consolidate a Findings list in the conclusion part, as the presented Findings are dispersed throughout the 36 p
1. The paper constructs a large face forgery dataset, utilizing a variety of generation methods, including Prompt-guided Generation techniques such as Text2Image, Image2Image, and Text2Video, as well as Task-oriented Generation techniques. Compared to previous datasets, this dataset employs a more comprehensive and diverse range of generation methods, resulting in a significantly larger scale. 2. The paper employs 20 detection methods and conducts extensive experiments and analyses using DeepFa
1. The paper may not present a methodological contribution in itself, but the large and comprehensive dataset is valuable. 2. I have some questions regarding missing details. The authors selected 20 detection methods, including both image-level and video-level approaches, but did not clarify the representativeness of these methods. Additionally, while the authors conducted analyses of texture and frequency domain features, the chosen methods seem to focus more on detection models that utilize f
- This work implements 34 distinct face forgery techniques for creating fake data, although this work does not create new real faces in their dataset. - This work involves both spatial and temporal detection methods for comparison, which is new in the existing face forgery benchmark. - Some findings are new to me such as Finding-10.
**1. Unclear and ambiguous definitions for the forgery types:** In this paper, face forgery techniques are categorized into two main types: prompt-guided and task-oriented. However, both of these concepts can be rather abstract and unclear, potentially misleading the community. Specifically: - In Line 48, the author mentions for the first time that "current deepfake datasets focus on relatively outdated task-oriented based face forgery techniques." However, the meaning of "task-oriented-based fa
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Digital Media Forensic Detection · Biometric Identification and Security
