The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts
Lisa Koopmans, Maruf A. Dhali, Lambert Schomaker

TL;DR
This study investigates how character-level data augmentation improves the accuracy of machine learning models in dating historical manuscripts, demonstrating modest performance gains across various collections.
Contribution
It introduces data augmentation techniques tailored for paleographic features, enhancing the robustness of models used for dating ancient manuscripts.
Findings
Data augmentation improves dating accuracy by 1-3%.
Models trained with augmented data outperform baseline models.
Potential for further improvements with script-specific models.
Abstract
Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. However, the sparse availability of such documents poses a challenge in obtaining robust systems. Hence, the research of this article explores the influence of data augmentation on the dating of historical manuscripts. Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts of different collections, including the Medieval Paleographical Scale, early Aramaic manuscripts, and the Dead Sea Scrolls. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
