When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents

Jiaqi Wu; Yuchen Zhou; Dennis Tsang Ng; Xingyu Shen; Kidus Zewde; Ankit Raj; Tommy Duong; Simiao Ren

arXiv:2604.25213·cs.CV·April 29, 2026

When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents

Jiaqi Wu, Yuchen Zhou, Dennis Tsang Ng, Xingyu Shen, Kidus Zewde, Ankit Raj, Tommy Duong, Simiao Ren

PDF

4 Datasets

TL;DR

This paper demonstrates that GPT-Image-2 can produce document forgeries indistinguishable from real images, and evaluates the effectiveness of human and computational detectors, revealing significant detection challenges.

Contribution

It introduces a new dataset of GPT-Image-2 forgeries, benchmarks multiple detection methods, and shows the difficulty of identifying AI-generated document edits.

Findings

01

Humans perform at chance level in detecting forgeries.

02

Computational detectors only modestly outperform chance.

03

Detection accuracy drops significantly when identifying GPT-Image-2 inpainting.

Abstract

OpenAI's GPT-Image-2 has effectively erased the visual boundary between authentic and AI-edited document images: a single number on a receipt can be replaced in under a second for a few cents. We release AIForge-Doc v2, a paired dataset of 3,066 GPT-Image-2 document forgeries with pixel-precise masks in DocTamper-compatible format, and benchmark four lines of defence: human inspectors (N=120, n=365 pair-votes via the public 2AFC site CanUSpotAI.com), TruFor (generic forensic), DocTamper (qcf-568, document-specific), and the same GPT-Image-2 model as a zero-shot self-judge -- asked, to avoid the trivial "image is mostly real" reading, whether any region was generated or edited by an AI image model. Human 2AFC accuracy is 0.501, indistinguishable from chance: even side-by-side, inspectors cannot tell GPT-Image-2 receipt forgeries from authentic counterparts. The three computational judges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.