Exploring Temporal Coherence for More General Video Face Forgery Detection
Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen

TL;DR
This paper introduces a novel end-to-end framework leveraging temporal coherence, combining a fully temporal convolution network and a Temporal Transformer, to improve general video face forgery detection.
Contribution
It proposes a new framework that effectively captures temporal features and long-term coherence without pre-training, enhancing forgery detection accuracy.
Findings
Outperforms existing face forgery detection methods.
Effective in detecting new types of face forgery videos.
No need for pre-training or external datasets.
Abstract
Although current face manipulation techniques achieve impressive performance regarding quality and controllability, they are struggling to generate temporal coherent face videos. In this work, we explore to take full advantage of the temporal coherence for video face forgery detection. To achieve this, we propose a novel end-to-end framework, which consists of two major stages. The first stage is a fully temporal convolution network (FTCN). The key insight of FTCN is to reduce the spatial convolution kernel size to 1, while maintaining the temporal convolution kernel size unchanged. We surprisingly find this special design can benefit the model for extracting the temporal features as well as improve the generalization capability. The second stage is a Temporal Transformer network, which aims to explore the long-term temporal coherence. The proposed framework is general and flexible,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Layer Normalization · Label Smoothing · Convolution
