A Survey of Historical Document Image Datasets

Konstantina Nikolaidou; Mathias Seuret; Hamam Mokayed; Marcus Liwicki

arXiv:2203.08504·cs.CV·November 1, 2022·1 cites

A Survey of Historical Document Image Datasets

Konstantina Nikolaidou, Mathias Seuret, Hamam Mokayed, Marcus Liwicki

PDF

Open Access

TL;DR

This paper systematically reviews 65 datasets for historical document image analysis, categorizing tasks, summarizing dataset features, and discussing challenges and standards to improve research comparability.

Contribution

It provides a comprehensive meta-study of existing datasets, including statistics, task types, and benchmarks, highlighting gaps and proposing standardization practices.

Findings

01

65 datasets analyzed with detailed summaries

02

Identification of gaps in dataset formats and evaluation metrics

03

Recommendations for standardization and conversion tools

Abstract

This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms. However, because of the very large variety of the actual data (e.g., scripts, tasks, dates, support systems, and amount of deterioration), the different formats for data and label representation, and the different evaluation processes and benchmarks, finding appropriate datasets is a difficult task. This work fills this gap, presenting a meta-study on existing datasets. After a systematic selection process (according to PRISMA guidelines), we select 65 studies that are chosen based on different factors, such as the year of publication, number of methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction