Latent Spatiotemporal Adaptation for Generalized Face Forgery Video   Detection

Daichi Zhang; Zihao Xiao; Jianmin Li; Shiming Ge

arXiv:2309.04795·cs.CV·October 25, 2024·2 cites

Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection

Daichi Zhang, Zihao Xiao, Jianmin Li, Shiming Ge

PDF

Open Access

TL;DR

This paper introduces LAST, a novel method that models and adapts to the spatiotemporal patterns of face forgery videos in latent space, significantly improving generalization to unseen forgery methods.

Contribution

The paper proposes a latent spatiotemporal adaptation approach that enhances face forgery detection generalization by modeling patterns in latent space and using semi-supervised learning.

Findings

01

Achieves state-of-the-art results on public datasets.

02

Demonstrates strong generalization to unseen forgery methods.

03

Pre-training with self-supervised tasks improves robustness.

Abstract

Face forgery videos have caused severe public concerns, and many detectors have been proposed. However, most of these detectors suffer from limited generalization when detecting videos from unknown distributions, such as from unseen forgery methods. In this paper, we find that different forgery videos have distinct spatiotemporal patterns, which may be the key to generalization. To leverage this finding, we propose a Latent Spatiotemporal Adaptation~(LAST) approach to facilitate generalized face forgery video detection. The key idea is to optimize the detector adaptive to the spatiotemporal patterns of unknown videos in latent space to improve the generalization. Specifically, we first model the spatiotemporal patterns of face videos by incorporating a lightweight CNN to extract local spatial features of each frame and then cascading a vision transformer to learn the long-term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings