The COTe score: A decomposable framework for evaluating Document Layout Analysis models
Jonathan Bourne, Mwiza Simbeye, Ishtar Govia

TL;DR
This paper introduces the COTe score, a new decomposable metric for evaluating Document Layout Analysis models that focuses on semantic structure, providing more nuanced insights than traditional object detection metrics.
Contribution
The paper presents the COTe score and SSU framework, offering a more robust and interpretable evaluation method for DLA models, along with a labeled dataset and Python library.
Findings
COTe score outperforms traditional metrics in revealing model failure modes.
COTe reduces interpretation-performance gap by up to 76%.
The SSU approach is effective even without explicit labeling.
Abstract
Document Layout analysis (DLA), is the process by which a page is parsed into meaningful elements, often using machine learning models. Typically, the quality of a model is judged using general object detection metrics such as IoU, F1 or mAP. However, these metrics are designed for images that are 2D projections of 3D space, not for the natively 2D imagery of printed media. This discrepancy can result in misleading or uninformative interpretation of model performance by the metrics. To encourage more robust, comparable, and nuanced DLA, we introduce: The Structural Semantic Unit (SSU) a relational labelling approach that shifts the focus from the physical to the semantic structure of the content; and the Coverage, Overlap, Trespass, and Excess (COTe) score, a decomposable metric for measuring page parsing quality. We demonstrate the value of these methods through case studies and by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
