ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
Fanrui Zhang, Jiawei Liu, Jiaying Zhu, Esther Sun, Dong Li, Qiang Zhang, and Zheng-Jun Zha

TL;DR
ForgeryGPT introduces a multimodal LLM framework that improves image forgery detection and localization by capturing high-order forensic knowledge and providing explainable, interactive results.
Contribution
It advances IFDL by integrating a Mask-Aware Forgery Extractor with a new LLM architecture and a three-stage training strategy for pixel-level forgery understanding.
Findings
Effective pixel-level forgery localization demonstrated.
Enhanced detection accuracy over existing methods.
Supports explainable and interactive forgery analysis.
Abstract
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
