Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping, Yao, Shouhong Ding, Li Yuan

TL;DR
This paper introduces a novel approach for deepfake video detection that combines video-level blending data and a lightweight spatiotemporal adapter to improve generalization, efficiency, and balanced artifact learning across diverse forgeries.
Contribution
It proposes a new video-level blending data technique and a lightweight spatiotemporal adapter to enhance deepfake detection models' generalization and efficiency.
Findings
Effective generalization to unseen forgeries
Balanced learning of spatial and temporal artifacts
Improved efficiency with lightweight model design
Abstract
Three key challenges hinder the development of current deepfake video detection: (1) Temporal features can be complex and diverse: how can we identify general temporal artifacts to enhance model generalization? (2) Spatiotemporal models often lean heavily on one type of artifact and ignore the other: how can we ensure balanced learning from both? (3) Videos are naturally resource-intensive: how can we tackle efficiency without compromising accuracy? This paper attempts to tackle the three challenges jointly. First, inspired by the notable generality of using image-level blending data for image forgery detection, we investigate whether and how video-level blending can be effective in video. We then perform a thorough analysis and identify a previously underexplored temporal forgery artifact: Facial Feature Drift (FFD), which commonly exists across different forgeries. To reproduce FFD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsAdapter
