ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Ahmad ALBarqawi; Mahmoud Nazzal; Issa Khalil; Abdallah Khreishah; NhatHai Phan

arXiv:2507.18031·cs.CV·February 23, 2026

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan

PDF

Open Access

TL;DR

ViGText introduces a novel graph-based deepfake detection method that combines vision-language explanations and multi-level feature extraction, significantly improving generalization and robustness against sophisticated deepfakes.

Contribution

The paper presents ViGText, a new approach integrating image patches, text explanations, and graph neural networks for enhanced deepfake detection.

Findings

01

F1 score improved from 72.45% to 98.32% under generalization.

02

Recall increased by 11.1% compared to other methods.

03

Classification performance degradation limited to less than 4% under targeted attacks.

Abstract

The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications