Generalizing Video DeepFake Detection by Self-generated Audio-Visual Pseudo-Fakes
Zihe Wei, Yuezun Li

TL;DR
This paper introduces AVPF, a method that enhances video DeepFake detection generalizability by training on self-generated audio-visual pseudo-fakes, improving performance across diverse datasets without using real DeepFakes.
Contribution
Proposes AVPF, a novel training approach that creates pseudo-fakes from authentic data to improve DeepFake detection generalization without relying on real DeepFake samples.
Findings
Achieves up to 7.4% performance improvement on multiple datasets.
Demonstrates strong generalizability of AVPF across various scenarios.
Does not require real DeepFake data for training.
Abstract
Detecting video deepfakes has become increasingly urgent in recent years. Given the audio-visual information in videos, existing methods typically expose deepfakes by modeling cross-modal correspondence using specifically designed architectures with publicly available datasets. While they have shown promising results, their effectiveness often degrades in real-world scenarios, as the limited diversity of training datasets naturally restricts generalizability to unseen cases. To address this, we propose a simple yet effective method, called AVPF, which can notably enhance model generalizability by training with self-generated Audio-Visual Pseudo-Fakes.The key idea of AVPF is to create pseudo-fake training samples that contain diverse audio-visual correspondence patterns commonly observed in real-world deepfakes. We highlight that AVPF is generated solely from authentic samples, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
