dhSegment: A generic deep-learning approach for document segmentation

Sofia Ares Oliveira; Benoit Seguin; and Frederic Kaplan (Digital; Humanities Laboratory; EPFL; Switzerland)

arXiv:1804.10371·cs.CV·August 15, 2019

dhSegment: A generic deep-learning approach for document segmentation

Sofia Ares Oliveira, Benoit Seguin, and Frederic Kaplan (Digital, Humanities Laboratory, EPFL, Switzerland)

PDF

5 Repos

TL;DR

This paper introduces dhSegment, a versatile deep learning framework using a CNN-based pixel predictor for multiple document segmentation tasks, demonstrating competitive results and high flexibility across various historical document processing challenges.

Contribution

The paper presents a unified CNN-based approach capable of handling diverse document segmentation tasks with minimal task-specific modifications.

Findings

01

Single CNN architecture achieves competitive results across tasks.

02

Most post-processing steps are simple, standard, and reusable.

03

Framework demonstrates high flexibility and generality.

Abstract

In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.