Self-supervised Transformer for Deepfake Detection
Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, Nenghai Yu

TL;DR
This paper introduces a self-supervised transformer approach using audio-visual contrastive learning to improve deepfake detection, leveraging robust lip movement features without extensive labeled data.
Contribution
It proposes a novel self-supervised method for learning lip motion features via contrastive learning, reducing reliance on labeled datasets and enhancing deepfake detection performance.
Findings
Self-supervised method performs comparably or better than supervised pre-training.
The approach enhances robustness against post-processing operations like compression.
Lip movement features are effective for deepfake detection.
Abstract
The fast evolution and widespread of deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors. Some works capture the features that are unrelated to method-specific artifacts, such as clues of blending boundary, accumulated up-sampling, to strengthen the generalization ability. However, the effectiveness of these methods can be easily corrupted by post-processing operations such as compression. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection. For example, lip movement has been proved to be a kind of robust and good-transferring highlevel semantic feature, which can be learned from the lipreading task. However, the existing method pre-trains the lip feature extraction model in a supervised manner, which requires plenty of human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Learning
