LED: A Benchmark for Evaluating Layout Error Detection in Document Analysis
Inbum Heo, Taewook Hwang, Jeesu Jung, Sangkeun Jung

TL;DR
This paper introduces LED, a comprehensive benchmark for evaluating structural error detection in document layout analysis, addressing limitations of traditional metrics by focusing on logical inconsistencies and reasoning capabilities.
Contribution
We propose LED, a new benchmark with standardized error types, realistic error simulation, and evaluation tasks to assess structural reasoning in document layout analysis models.
Findings
State-of-the-art models show weaknesses in structural understanding.
LED enables detailed assessment of model reasoning capabilities.
Benchmark reveals modality and architecture-specific deficiencies.
Abstract
Recent advances in Large Language Models (LLMs) and Large Multimodal Models (LMMs) have improved Document Layout Analysis (DLA), yet structural errors such as region merging, splitting, and omission remain persistent. Conventional overlap-based metrics (e.g., IoU, mAP) fail to capture such logical inconsistencies. To overcome this limitation, we propose Layout Error Detection (LED), a benchmark that evaluates structural reasoning in DLA predictions beyond surface-level accuracy. LED defines eight standardized error types (Missing, Hallucination, Size Error, Split, Merge, Overlap, Duplicate, and Misclassification) and provides quantitative rules and injection algorithms for realistic error simulation. Using these definitions, we construct LED-Dataset and design three evaluation tasks: document-level error detection, document-level error-type classification, and element-level error-type…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
