GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

Yaning Zhang; Linlin Shen; Zitong Yu; Chunjie Ma; Zan Gao

arXiv:2603.29295·cs.CV·April 1, 2026

GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

Yaning Zhang, Linlin Shen, Zitong Yu, Chunjie Ma, Zan Gao

PDF

1 Repo

TL;DR

GazeCLIP introduces a gaze-guided CLIP model with adaptive language prompts to improve deepfake attribution and detection, especially on unseen forgery methods, by leveraging gaze differences and dynamic language refinement.

Contribution

The paper proposes a novel gaze-aware CLIP framework with adaptive language prompts and a new benchmark for fine-grained deepfake attribution and detection.

Findings

01

Outperforms state-of-the-art by 6.56% ACC and 5.32% AUC on benchmark.

02

Utilizes gaze differences to enhance generalization to unseen forgery methods.

03

Introduces a gaze-aware image encoder and dynamic language refinement for better vision-language matching.

Abstract

Current deepfake attribution or deepfake detection works tend to exhibit poor generalization to novel generative methods due to the limited exploration in visual modalities alone. They tend to assess the attribution or detection performance of models on unseen advanced generators, coarsely, and fail to consider the synergy of the two tasks. To this end, we propose a novel gaze-guided CLIP with adaptive-enhanced fine-grained language prompts for fine-grained deepfake attribution and detection (DFAD). Specifically, we conduct a novel and fine-grained benchmark to evaluate the DFAD performance of networks on novel generators like diffusion and flow models. Additionally, we introduce a gaze-aware model based on CLIP, which is devised to enhance the generalization to unseen face forgery attacks. Built upon the novel observation that there are significant distribution differences between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.