Sharp Multiple Instance Learning for DeepFake Video Detection
Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui, Wang, Hui Xue, Quan Lu

TL;DR
This paper introduces a novel sharp multiple instance learning framework for detecting partially manipulated DeepFake videos, effectively handling weak labels and improving detection accuracy through spatial-temporal encoding and theoretical analysis.
Contribution
The paper proposes a new S-MIL method that directly maps instance embeddings to video predictions, relieving gradient vanishing issues, and introduces a new dataset for partial DeepFake detection.
Findings
S-MIL outperforms existing methods on FFPMS and DFDC datasets.
Spatial-temporal encoding enhances detection of manipulated faces.
State-of-the-art results on single-frame DeepFake detection.
Abstract
With the rapid development of facial manipulation techniques, face forgery has received considerable attention in multimedia and computer vision community due to security concerns. Existing methods are mostly designed for single-frame detection trained with precise image-level labels or for video-level prediction by only modeling the inter-frame inconsistency, leaving potential high risks for DeepFake attackers. In this paper, we introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated. We address this problem by multiple instance learning framework, treating faces and input video as instances and bag respectively. A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction, rather than from instance embeddings to instance prediction and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
