End-to-end information extraction in handwritten documents: Understanding Paris marriage records from 1880 to 1940
Thomas Constum, Lucas Preel, Th\'eo Larcher, Pierrick Tranouez,, Thierry Paquet, Sandra Br\'ee

TL;DR
This paper introduces an end-to-end deep learning architecture for extracting detailed information from handwritten Paris marriage records spanning 1880-1940, achieving state-of-the-art results and providing a new dataset for research.
Contribution
The paper presents a novel end-to-end model for joint handwritten text recognition and information extraction, along with a new annotated dataset for full-page documents.
Findings
Achieved state-of-the-art full-page information extraction results on Esposalles.
Demonstrated the effectiveness of different encoding strategies for named entity recognition.
Provided a publicly available annotated dataset for handwritten document analysis.
Abstract
The EXO-POPP project aims to establish a comprehensive database comprising 300,000 marriage records from Paris and its suburbs, spanning the years 1880 to 1940, which are preserved in over 130,000 scans of double pages. Each marriage record may encompass up to 118 distinct types of information that require extraction from plain text. In this paper, we introduce the M-POPP dataset, a subset of the M-POPP database with annotations for full-page text recognition and information extraction in both handwritten and printed documents, and which is now publicly available. We present a fully end-to-end architecture adapted from the DAN, designed to perform both handwritten text recognition and information extraction directly from page images without the need for explicit segmentation. We showcase the information extraction capabilities of this architecture by achieving a new state of the art for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
