The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition
Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia,, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara

TL;DR
This paper introduces the LAM dataset, a large line-level handwritten text recognition benchmark for Italian ancient manuscripts, designed to advance research in recognizing historical handwritten texts with variability over time.
Contribution
The paper presents the LAM dataset, a new large-scale benchmark for line-level HTR of Italian manuscripts, including two data splits to evaluate recognition across different time periods.
Findings
State-of-the-art HTR models achieve baseline performance on LAM.
The dataset enables studying handwriting variability over 60 years.
LAM outperforms existing line-level HTR benchmarks.
Abstract
Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting -- even of the same author over a wide time-span -- and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
