A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

Yijun Bei; Hengrui Lou; Jinsong Geng; Erteng Liu; Lechao Cheng; Jie; Song; Mingli Song; Zunlei Feng

arXiv:2406.09181·cs.CV·June 17, 2024·1 cites

A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie, Song, Mingli Song, Zunlei Feng

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces DeepFaceGen, a large-scale benchmark dataset with nearly 1.55 million face images and videos, designed to evaluate and improve face forgery detection methods across diverse conditions.

Contribution

The creation of DeepFaceGen, a comprehensive and diverse benchmark dataset for face forgery detection, and its use to evaluate 13 detection techniques, advancing research in this field.

Findings

01

Detection performance varies significantly across methods.

02

Content diversity and ethnicity fairness impact detection accuracy.

03

Insights suggest directions for improving face forgery detection.

Abstract

With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 5

Strengths

This paper is easy-to-follow. This work discusses large-scale deepfakes generated by a variety of methods, including recent prompt-guided deepfakes, which is a limitation of prior datasets. Since deepfake media are rapidly evolving, a reliable and cutting-edge benchmark has foreseeable value in the field of deepfake forensics.

Weaknesses

While the authors present up to 13 findings, these findings are not impressive enough. There are some shortcomings in the claimed findings including: * Some findings seem to simply reassert well-known motivations from previous works, such as Finding 1 with MAT(Zhao et al, CVPR 2021). * Some findings are common sense, such as Finding 2&7. Furthermore, it is recommended that authors consolidate a Findings list in the conclusion part, as the presented Findings are dispersed throughout the 36 p

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper constructs a large face forgery dataset, utilizing a variety of generation methods, including Prompt-guided Generation techniques such as Text2Image, Image2Image, and Text2Video, as well as Task-oriented Generation techniques. Compared to previous datasets, this dataset employs a more comprehensive and diverse range of generation methods, resulting in a significantly larger scale. 2. The paper employs 20 detection methods and conducts extensive experiments and analyses using DeepFa

Weaknesses

1. The paper may not present a methodological contribution in itself, but the large and comprehensive dataset is valuable. 2. I have some questions regarding missing details. The authors selected 20 detection methods, including both image-level and video-level approaches, but did not clarify the representativeness of these methods. Additionally, while the authors conducted analyses of texture and frequency domain features, the chosen methods seem to focus more on detection models that utilize f

Reviewer 03Rating 5Confidence 5

Strengths

- This work implements 34 distinct face forgery techniques for creating fake data, although this work does not create new real faces in their dataset. - This work involves both spatial and temporal detection methods for comparison, which is new in the existing face forgery benchmark. - Some findings are new to me such as Finding-10.

Weaknesses

**1. Unclear and ambiguous definitions for the forgery types:** In this paper, face forgery techniques are categorized into two main types: prompt-guided and task-oriented. However, both of these concepts can be rather abstract and unclear, potentially misleading the community. Specifically: - In Line 48, the author mentions for the first time that "current deepfake datasets focus on relatively outdated task-oriented based face forgery techniques." However, the meaning of "task-oriented-based fa

Code & Models

Repositories

hengruilou/deepfacegen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Digital Media Forensic Detection · Biometric Identification and Security