Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages

Klaudia Ropel; Krzysztof Kutt; Luiz do Valle Miranda; Grzegorz J. Nalepa

arXiv:2506.18069·cs.DL·June 25, 2025

Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages

Klaudia Ropel, Krzysztof Kutt, Luiz do Valle Miranda, Grzegorz J. Nalepa

PDF

TL;DR

This paper presents a deep learning-based method for analyzing incunabula pages, including object detection, OCR, and image classification, demonstrating high accuracy and potential for digital humanities research.

Contribution

It introduces a new annotated dataset and combines multiple deep learning models to analyze the structure and content of early printed book pages.

Findings

01

YOLO11n achieved F1=0.94 on custom data

02

Tesseract OCR outperformed Kraken OCR on Text regions

03

ResNet18 achieved 98.7% accuracy in classifying illustration types

Abstract

We developed a proof-of-concept method for the automatic analysis of the structure and content of incunabula pages. A custom dataset comprising 500 annotated pages from five different incunabula was created using resources from the Jagiellonian Digital Library. Each page was manually labeled with five predefined classes: Text, Title, Picture, Table, and Handwriting. Additionally, the publicly available DocLayNet dataset was utilized as supplementary training data. To perform object detection, YOLO11n and YOLO11s models were employed and trained using two strategies: a combined dataset (DocLayNet and the custom dataset) and the custom dataset alone. The highest performance (F1 = 0.94) was achieved by the YOLO11n model trained exclusively on the custom data. Optical character recognition was then conducted on regions classified as Text, using both Tesseract and Kraken OCR, with Tesseract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training