Rethinking VLMs for Image Forgery Detection and Localization
Shaofeng Guo, Jiequan Cui, Richang Hong

TL;DR
This paper explores how vision-language models can be adapted for image forgery detection and localization, revealing their limitations and proposing a new pipeline that improves performance and interpretability across multiple benchmarks.
Contribution
The paper introduces IFDL-VLM, a novel pipeline that leverages explicit forgery masks as priors to enhance VLMs for image forgery detection and localization.
Findings
Achieves state-of-the-art results on 9 benchmarks.
Demonstrates improved generalization across datasets.
Enhances interpretability of detection results.
Abstract
With the rapid rise of Artificial Intelligence Generated Content (AIGC), image manipulation has become increasingly accessible, posing significant challenges for image forgery detection and localization (IFDL). In this paper, we study how to fully leverage vision-language models (VLMs) to assist the IFDL task. In particular, we observe that priors from VLMs hardly benefit the detection and localization performance and even have negative effects due to their inherent biases toward semantic plausibility rather than authenticity. Additionally, the location masks explicitly encode the forgery concepts, which can serve as extra priors for VLMs to ease their training optimization, thus enhancing the interpretability of detection and localization results. Building on these findings, we propose a new IFDL pipeline named IFDL-VLM. To demonstrate the effectiveness of our method, we conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts
