AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

Jiaqi Wu; Yuchen Zhou; Muduo Xu; Zisheng Liang; Simiao Ren; Jiayu Xue; Meige Yang; Siying Chen; Jingheng Huan

arXiv:2602.20569·cs.CV·February 25, 2026

AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, Jingheng Huan

PDF

Open Access

TL;DR

AIForge-Doc introduces a new benchmark dataset for detecting AI-generated tampering in financial and form documents, revealing significant challenges for current detection methods due to the sophistication of AI inpainting techniques.

Contribution

This paper provides the first dedicated benchmark for diffusion-model-based inpainting in documents, with pixel-level annotations and evaluation of existing detectors, highlighting their limitations.

Findings

01

Existing detectors perform poorly on AI-forged documents.

02

AI-forged images are nearly indistinguishable to automated detectors and VLMs.

03

The benchmark exposes a critical gap in current document forgery detection capabilities.

Abstract

We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery datasets rely on traditional digital editing tools (e.g., Adobe Photoshop, GIMP), creating a critical gap: state-of-the-art detectors are blind to the rapidly growing threat of AI-forged document fraud. AIForge-Doc addresses this gap by systematically forging numeric fields in real-world receipt and form images using two AI inpainting APIs -- Gemini 2.5 Flash Image and Ideogram v2 Edit -- yielding 4,061 forged images from four public document datasets (CORD, WildReceipt, SROIE, XFUND) across nine languages, annotated with pixel-precise tampered-region masks in DocTamper-compatible format. We benchmark three representative detectors -- TruFor, DocTamper, and a zero-shot GPT-4o judge -- and find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques