ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization

Fanrui Zhang; Jiawei Liu; Jiaying Zhu; Esther Sun; Dong Li; Qiang Zhang; and Zheng-Jun Zha

arXiv:2410.10238·cs.CV·April 8, 2026·3 cites

ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization

Fanrui Zhang, Jiawei Liu, Jiaying Zhu, Esther Sun, Dong Li, Qiang Zhang, and Zheng-Jun Zha

PDF

TL;DR

ForgeryGPT introduces a multimodal LLM framework that improves image forgery detection and localization by capturing high-order forensic knowledge and providing explainable, interactive results.

Contribution

It advances IFDL by integrating a Mask-Aware Forgery Extractor with a new LLM architecture and a three-stage training strategy for pixel-level forgery understanding.

Findings

01

Effective pixel-level forgery localization demonstrated.

02

Enhanced detection accuracy over existing methods.

03

Supports explainable and interactive forgery analysis.

Abstract

Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.