Self-supervised Transformer for Deepfake Detection

Hanqing Zhao; Wenbo Zhou; Dongdong Chen; Weiming Zhang; Nenghai Yu

arXiv:2203.01265·cs.CV·March 3, 2022·20 cites

Self-supervised Transformer for Deepfake Detection

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, Nenghai Yu

PDF

Open Access

TL;DR

This paper introduces a self-supervised transformer approach using audio-visual contrastive learning to improve deepfake detection, leveraging robust lip movement features without extensive labeled data.

Contribution

It proposes a novel self-supervised method for learning lip motion features via contrastive learning, reducing reliance on labeled datasets and enhancing deepfake detection performance.

Findings

01

Self-supervised method performs comparably or better than supervised pre-training.

02

The approach enhances robustness against post-processing operations like compression.

03

Lip movement features are effective for deepfake detection.

Abstract

The fast evolution and widespread of deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors. Some works capture the features that are unrelated to method-specific artifacts, such as clues of blending boundary, accumulated up-sampling, to strengthen the generalization ability. However, the effectiveness of these methods can be easily corrupted by post-processing operations such as compression. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection. For example, lip movement has been proved to be a kind of robust and good-transferring highlevel semantic feature, which can be learned from the lipreading task. However, the existing method pre-trains the lip feature extraction model in a supervised manner, which requires plenty of human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Learning