Multiple Document Datasets Pre-training Improves Text Line Detection   With Deep Neural Networks

M\'elodie Boillet; Christopher Kermorvant; Thierry Paquet

arXiv:2012.14163·cs.CV·September 20, 2021

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

M\'elodie Boillet, Christopher Kermorvant, Thierry Paquet

PDF

TL;DR

This paper presents a fully convolutional U-shaped network for document layout analysis, demonstrating that pre-training on multiple document datasets enhances performance without relying on natural scene image pre-training.

Contribution

The study introduces a document-specific pre-training approach that improves text line detection and layout analysis using a U-shaped network trained from scratch.

Findings

01

Pre-training on multiple document datasets improves accuracy.

02

Natural scene image pre-training is unnecessary for good results.

03

The proposed method outperforms state-of-the-art on various datasets.

Abstract

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.