HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition
Christian M. Dahl, Torben Johansen, Emil N. S{\o}rensen, Simon, Wittrock

TL;DR
HANA is a large-scale handwritten name database designed to improve offline handwritten text recognition, especially for personal names, by providing extensive data for training and benchmarking deep learning models, leading to enhanced transcription accuracy.
Contribution
The paper introduces HANA, a comprehensive database of over 3.3 million handwritten names, enabling transfer learning and benchmarking for improved handwritten text recognition models.
Findings
Significantly improved transcription accuracy on Danish and US census data.
HANA enables transfer learning for handwritten name recognition.
Benchmark results demonstrate state-of-the-art performance.
Abstract
Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 3.3 million names. The database contain more than 105 thousand unique names with a total of more than 1.1 million images of personal names, which proves useful for transfer learning to other settings. We provide three examples hereof,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Text and Document Classification Technologies
