DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

Fanwei Zeng; Changtao Miao; Jing Huang; Zhiya Tan; Shutao Gong; Xiaoming Yu; Yang Wang; Weibin Yao; Joey Tianyi Zhou; Jianshu Li; Yin Yan

arXiv:2604.02694·cs.CV·April 6, 2026

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

Fanwei Zeng, Changtao Miao, Jing Huang, Zhiya Tan, Shutao Gong, Xiaoming Yu, Yang Wang, Weibin Yao, Joey Tianyi Zhou, Jianshu Li, Yin Yan

PDF

1 Repo

TL;DR

DocShield introduces a unified, evidence-grounded AI framework for detecting, localizing, and explaining text-centric document forgeries through visual-logical co-reasoning.

Contribution

It presents the first integrated approach combining visual and textual reasoning for document forgery detection, along with a new multilingual dataset and code release.

Findings

01

Outperforms existing methods with a 41.4% increase in macro F1 score.

02

Achieves 23.4% higher macro F1 on T-IC13 compared to GPT-4o.

03

Demonstrates effective evidence-grounded forensic analysis through novel reasoning mechanisms.

Abstract

The rapid progress of generative AI has enabled increasingly realistic text-centric image forgeries, posing major challenges to document safety. Existing forensic methods mainly rely on visual cues and lack evidence-based reasoning to reveal subtle text manipulations. Detection, localization, and explanation are often treated as isolated tasks, limiting reliability and interpretability. To tackle these challenges, we propose DocShield, the first unified framework formulating text-centric forgery analysis as a visual-logical co-reasoning problem. At its core, a novel Cross-Cues-aware Chain of Thought (CCT) mechanism enables implicit agentic reasoning, iteratively cross-validating visual anomalies with textual semantics to produce consistent, evidence-grounded forensic analysis. We further introduce a Weighted Multi-Task Reward for GRPO-based optimization, aligning reasoning structure,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.