TL;DR
This paper introduces a deep learning-based handwriting classification method to assist art-historic researchers by automatically labeling text fragments in digitized documents, enabling efficient analysis without full reading.
Contribution
It proposes a novel handwriting classification task and models tailored for multi-language art-historic documents, addressing lack of annotated data and supporting targeted document retrieval.
Findings
Deep learning models effectively classify handwritten text fragments.
Classification aids historians by highlighting relevant document sections.
The approach demonstrates practical utility on real-world art-historic datasets.
Abstract
Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive. In this paper, we focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI. Since the archive consists of documents written in several languages and lacks annotated training data for the creation of recognition models, we propose the task of handwriting classification as a new step for a handwriting OCR pipeline. We propose a handwriting classification model that labels extracted text fragments, eg, numbers, dates, or words, based on their visual structure. Such a classification supports historians by highlighting documents that contain a specific class of text without the need to read the entire content. To this end, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
