TL;DR
This paper introduces dhSegment, a versatile deep learning framework using a CNN-based pixel predictor for multiple document segmentation tasks, demonstrating competitive results and high flexibility across various historical document processing challenges.
Contribution
The paper presents a unified CNN-based approach capable of handling diverse document segmentation tasks with minimal task-specific modifications.
Findings
Single CNN architecture achieves competitive results across tasks.
Most post-processing steps are simple, standard, and reusable.
Framework demonstrates high flexibility and generality.
Abstract
In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
