ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization

Youqi Wang; Shen Chen; Haowei Wang; Rongxuan Peng; Taiping Yao; Shunquan Tan; Changsheng Chen; Bin Li; Shouhong Ding

arXiv:2602.14098·cs.CV·February 17, 2026

ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization

Youqi Wang, Shen Chen, Haowei Wang, Rongxuan Peng, Taiping Yao, Shunquan Tan, Changsheng Chen, Bin Li, Shouhong Ding

PDF

Open Access

TL;DR

ForgeryVCR introduces a visual-centric reasoning framework with forensic tools and strategic learning to improve image forgery detection and localization, surpassing existing text-centric models in accuracy and robustness.

Contribution

The paper presents ForgeryVCR, a novel framework that incorporates explicit visual forensic tools and a strategic learning paradigm for enhanced image forgery detection.

Findings

01

Achieves state-of-the-art performance in detection and localization

02

Demonstrates superior generalization and robustness

03

Utilizes minimal tool redundancy

Abstract

Existing Multimodal Large Language Models (MLLMs) for image forgery detection and localization predominantly operate under a text-centric Chain-of-Thought (CoT) paradigm. However, forcing these models to textually characterize imperceptible low-level tampering traces inevitably leads to hallucinations, as linguistic modalities are insufficient to capture such fine-grained pixel-level inconsistencies. To overcome this, we propose ForgeryVCR, a framework that incorporates a forensic toolbox to materialize imperceptible traces into explicit visual intermediates via Visual-Centric Reasoning. To enable efficient tool utilization, we introduce a Strategic Tool Learning post-training paradigm, encompassing gain-driven trajectory construction for Supervised Fine-Tuning (SFT) and subsequent Reinforcement Learning (RL) optimization guided by a tool utility reward. This paradigm empowers the MLLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques