DOCFORGE-BENCH: A Comprehensive 0-shot Benchmark for Document Forgery Detection and Analysis

Zengqi Zhao; Weidi Xia; En Wei; Yan Zhang; Jane Mo; Tiannan Zhang; Yuanqin Dai; Zexi Chen; Yiran Tao; Simiao Ren

arXiv:2603.01433·cs.CV·March 11, 2026

DOCFORGE-BENCH: A Comprehensive 0-shot Benchmark for Document Forgery Detection and Analysis

Zengqi Zhao, Weidi Xia, En Wei, Yan Zhang, Jane Mo, Tiannan Zhang, Yuanqin Dai, Zexi Chen, Yiran Tao, Simiao Ren

PDF

Open Access

TL;DR

DOCFORGE-BENCH introduces a zero-shot benchmark for document forgery detection, revealing calibration issues and the need for threshold adaptation, highlighting the ongoing challenge of reliable detection across diverse document types.

Contribution

The paper presents the first unified zero-shot benchmark for document forgery detection, evaluating multiple methods without domain adaptation, and uncovers calibration failures affecting real-world deployment.

Findings

01

Methods achieve moderate Pixel-AUC but near-zero Pixel-F1.

02

Calibration failure is due to score-distribution shift in tampered regions.

03

Threshold adaptation significantly improves detection performance.

Abstract

We present DOCFORGE-BENCH, the first unified zero-shot benchmark for document forgery detection, evaluating 14 methods across eight datasets spanning text tampering, receipt forgery, and identity document manipulation. Unlike fine-tuning-oriented evaluations such as ForensicHub [Du et al., 2025], DOCFORGE-BENCH applies all methods with their published pretrained weights and no domain adaptation -- a deliberate design choice that reflects the realistic deployment scenario where practitioners lack labeled document training data. Our central finding is a pervasive calibration failure invisible under single-threshold protocols: methods achieve moderate Pixel-AUC (>=0.76) yet near-zero Pixel-F1. This AUC-F1 gap is not a discrimination failure but a score-distribution shift: tampered regions occupy only 0.27-4.17% of pixels in document images -- an order of magnitude less than in natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis