The COTe score: A decomposable framework for evaluating Document Layout Analysis models

Jonathan Bourne; Mwiza Simbeye; Ishtar Govia

arXiv:2603.12718·cs.CV·March 17, 2026

The COTe score: A decomposable framework for evaluating Document Layout Analysis models

Jonathan Bourne, Mwiza Simbeye, Ishtar Govia

PDF

Open Access

TL;DR

This paper introduces the COTe score, a new decomposable metric for evaluating Document Layout Analysis models that focuses on semantic structure, providing more nuanced insights than traditional object detection metrics.

Contribution

The paper presents the COTe score and SSU framework, offering a more robust and interpretable evaluation method for DLA models, along with a labeled dataset and Python library.

Findings

01

COTe score outperforms traditional metrics in revealing model failure modes.

02

COTe reduces interpretation-performance gap by up to 76%.

03

The SSU approach is effective even without explicit labeling.

Abstract

Document Layout analysis (DLA), is the process by which a page is parsed into meaningful elements, often using machine learning models. Typically, the quality of a model is judged using general object detection metrics such as IoU, F1 or mAP. However, these metrics are designed for images that are 2D projections of 3D space, not for the natively 2D imagery of printed media. This discrepancy can result in misleading or uninformative interpretation of model performance by the metrics. To encourage more robust, comparable, and nuanced DLA, we introduce: The Structural Semantic Unit (SSU) a relational labelling approach that shifts the focus from the physical to the semantic structure of the content; and the Coverage, Overlap, Trespass, and Excess (COTe) score, a decomposable metric for measuring page parsing quality. We demonstrate the value of these methods through case studies and by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications