You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
Thibault Cl\'erice (ENC, CJM, HiSoMA, UJML, ALMAnaCH)

TL;DR
This paper introduces a novel approach for layout analysis in OCR by replacing pixel-based segmentation with object detection using YOLOv5 within the Kraken engine, improving efficiency and accuracy especially on small datasets.
Contribution
The paper proposes shifting from pixel classification to object detection for layout analysis, integrating YOLOv5 into Kraken, and provides new datasets and a package for this method.
Findings
YOLOv5 outperforms Kraken in small dataset segmentation.
Object detection simplifies layout analysis compared to pixel polygonization.
New datasets facilitate training and evaluation of object detection methods for historical documents.
Abstract
Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygonization of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely outperforms the first on small datasets (1110 samples and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
