How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
Yue Chen, Yihao Wang, Ziyi Tang, Keze Wang

TL;DR
This paper introduces a lightweight, structure-aware auditing framework for Document Layout Analysis systems, revealing their vulnerabilities and improving robustness evaluation beyond traditional footprint-based methods.
Contribution
It proposes a novel output-level auditing framework combining structural loss metrics and pathway attribution to better identify DLA vulnerabilities.
Findings
B-SLR aligns more closely with OCR instability than affected area metrics.
Small structurally targeted probes cause significant downstream performance degradation.
The framework shifts robustness evaluation from footprint-based to structure-aware auditing.
Abstract
Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
