Rethinking VLMs for Image Forgery Detection and Localization

Shaofeng Guo; Jiequan Cui; Richang Hong

arXiv:2603.12930·cs.CV·March 16, 2026

Rethinking VLMs for Image Forgery Detection and Localization

Shaofeng Guo, Jiequan Cui, Richang Hong

PDF

Open Access

TL;DR

This paper explores how vision-language models can be adapted for image forgery detection and localization, revealing their limitations and proposing a new pipeline that improves performance and interpretability across multiple benchmarks.

Contribution

The paper introduces IFDL-VLM, a novel pipeline that leverages explicit forgery masks as priors to enhance VLMs for image forgery detection and localization.

Findings

01

Achieves state-of-the-art results on 9 benchmarks.

02

Demonstrates improved generalization across datasets.

03

Enhances interpretability of detection results.

Abstract

With the rapid rise of Artificial Intelligence Generated Content (AIGC), image manipulation has become increasingly accessible, posing significant challenges for image forgery detection and localization (IFDL). In this paper, we study how to fully leverage vision-language models (VLMs) to assist the IFDL task. In particular, we observe that priors from VLMs hardly benefit the detection and localization performance and even have negative effects due to their inherent biases toward semantic plausibility rather than authenticity. Additionally, the location masks explicitly encode the forgery concepts, which can serve as extra priors for VLMs to ease their training optimization, thus enhancing the interpretability of detection and localization results. Building on these findings, we propose a new IFDL pipeline named IFDL-VLM. To demonstrate the effectiveness of our method, we conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts