Learning Natural Consistency Representation for Face Forgery Video   Detection

Daichi Zhang; Zihao Xiao; Shikun Li; Fanzhao Lin; Jianmin Li; and; Shiming Ge

arXiv:2407.10550·cs.CV·July 16, 2024·1 cites

Learning Natural Consistency Representation for Face Forgery Video Detection

Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and, Shiming Ge

PDF

Open Access

TL;DR

This paper introduces a self-supervised method called NACO that learns natural spatiotemporal consistency representations from real face videos to improve face forgery detection, enhancing robustness and generalization.

Contribution

The paper proposes a novel self-supervised learning framework combining CNNs and Transformers with spatial and temporal modules for face forgery detection.

Findings

01

Outperforms state-of-the-art methods in detection accuracy

02

Demonstrates strong generalization to unseen forgery methods

03

Shows robustness against various perturbations

Abstract

Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we examine whether we can address this issue by leveraging visual-only real face videos. To this end, we propose to learn the Natural Consistency representation (NACO) of real face videos in a self-supervised manner, which is inspired by the observation that fake videos struggle to maintain the natural spatiotemporal consistency even under unknown forgery methods and different perturbations. Our NACO first extracts spatial features of each frame by CNNs then integrates them into Transformer to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax