LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
Inbum Heo, Taewook Hwang, Jeesu Jung, Sangkeun Jung

TL;DR
This paper introduces LED, a benchmark for diagnosing structural layout errors in document analysis, addressing limitations of traditional metrics by evaluating error types and robustness of models.
Contribution
It proposes a new benchmark and dataset for evaluating structural errors in document layout analysis, highlighting model robustness beyond spatial overlap metrics.
Findings
LED effectively differentiates structural understanding capabilities
Reveals modality biases and performance trade-offs in models
Exposes limitations of traditional evaluation metrics
Abstract
Recent advancements in Document Layout Analysis through Large Language Models and Multimodal Models have significantly improved layout detection. However, despite these improvements, challenges remain in addressing critical structural errors, such as region merging, splitting, and missing content. Conventional evaluation metrics like IoU and mAP, which focus primarily on spatial overlap, are insufficient for detecting these errors. To address this limitation, we propose Layout Error Detection (LED), a novel benchmark designed to evaluate the structural robustness of document layout predictions. LED defines eight standardized error types, and formulates three complementary tasks: error existence detection, error type classification, and element-wise error type classification. Furthermore, we construct LED-Dataset, a synthetic dataset generated by injecting realistic structural errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Applications and Data Management · BIM and Construction Integration · Semantic Web and Ontologies
