Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li, Taiping Yao, Bin Li

TL;DR
This paper introduces a novel zero-shot deepfake detection method that reprograms pre-trained vision-language models like CLIP through input perturbations, significantly improving cross-dataset and cross-manipulation detection performance without extensive retraining.
Contribution
It proposes a reprogramming approach that manipulates input to adapt pre-trained VLMs for general deepfake detection, enhancing robustness and reducing training complexity.
Findings
Over 88% AUC in cross-dataset detection
Significant performance improvements across multiple benchmarks
Fewer trainable parameters needed for effective detection
Abstract
The proliferation of deepfake faces poses huge potential negative impacts on our daily lives. Despite substantial advancements in deepfake detection over these years, the generalizability of existing methods against forgeries from unseen datasets or created by emerging generative models remains constrained. In this paper, inspired by the zero-shot advantages of Vision-Language Models (VLMs), we propose a novel approach that repurposes a well-trained VLM for general deepfake detection. Motivated by the model reprogramming paradigm that manipulates the model prediction via input perturbations, our method can reprogram a pre-trained VLM model (e.g., CLIP) solely based on manipulating its input without tuning the inner parameters. First, learnable visual perturbations are used to refine feature extraction for deepfake detection. Then, we exploit information of face embedding to create…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsContrastive Language-Image Pre-training
