Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates
Natalia Bottaioli (Universit\'e Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, France, Facultad de Ingenier\'ia, Universidad de la Rep\'ublica, Montevideo, Uruguay, Digital Sense, Montevideo, Uruguay) Sol\`ene Tarride (TEKLIA, Paris

TL;DR
This paper compares normalized and diplomatic annotation strategies for automatic extraction of information from handwritten Uruguayan birth certificates, demonstrating their effectiveness varies by field type.
Contribution
It introduces a comparative analysis of annotation strategies for handwritten document information extraction using the Document Attention Network (DAN).
Findings
Normalized annotation excels for standardized fields like dates and places.
Diplomatic annotation is better for non-standardized fields like names.
Minimal training data suffices for effective fine-tuning of DAN.
Abstract
This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
