TL;DR
GazeCLIP introduces a gaze-guided CLIP model with adaptive language prompts to improve deepfake attribution and detection, especially on unseen forgery methods, by leveraging gaze differences and dynamic language refinement.
Contribution
The paper proposes a novel gaze-aware CLIP framework with adaptive language prompts and a new benchmark for fine-grained deepfake attribution and detection.
Findings
Outperforms state-of-the-art by 6.56% ACC and 5.32% AUC on benchmark.
Utilizes gaze differences to enhance generalization to unseen forgery methods.
Introduces a gaze-aware image encoder and dynamic language refinement for better vision-language matching.
Abstract
Current deepfake attribution or deepfake detection works tend to exhibit poor generalization to novel generative methods due to the limited exploration in visual modalities alone. They tend to assess the attribution or detection performance of models on unseen advanced generators, coarsely, and fail to consider the synergy of the two tasks. To this end, we propose a novel gaze-guided CLIP with adaptive-enhanced fine-grained language prompts for fine-grained deepfake attribution and detection (DFAD). Specifically, we conduct a novel and fine-grained benchmark to evaluate the DFAD performance of networks on novel generators like diffusion and flow models. Additionally, we introduce a gaze-aware model based on CLIP, which is devised to enhance the generalization to unseen face forgery attacks. Built upon the novel observation that there are significant distribution differences between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
