From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents
Sergio Torres Aguilar

TL;DR
This study compares Transformer and YOLO-based object detection models for layout analysis in complex historical documents, revealing that model choice depends on dataset complexity and that oriented bounding boxes are crucial for accuracy.
Contribution
It provides a comprehensive benchmark of Transformer and YOLO models on diverse historical document datasets, highlighting the importance of bounding box representation.
Findings
Transformers excel on simpler datasets like e-NDP.
YOLO-OBB outperforms others on complex datasets like CATMuS and HORAE.
Oriented Bounding Boxes are essential for accurate layout modeling.
Abstract
Robust Document Layout Analysis (DLA) is critical for the automated processing and understanding of historical documents with complex page organizations. This paper benchmarks five state-of-the-art object detection architectures on three annotated datasets representing a spectrum of codicological complexity: The e-NDP, a corpus of Parisian medieval registers (1326-1504); CATMuS, a diverse multiclass dataset derived from various medieval and modern sources (ca.12th-17th centuries) and HORAE, a corpus of decorated books of hours (ca.13th-16th centuries). We evaluate two Transformer-based models (Co-DETR, Grounding DINO) against three YOLO variants (AABB, OBB, and YOLO-World). Our findings reveal significant performance variations dependent on model architecture, data set characteristics, and bounding box representation. In the e-NDP dataset, Co-DETR achieves state-of-the-art results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques
MethodsSparse Evolutionary Training
