TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain
Sagar Chakraborty, Gaurav Harit, Saptarshi Ghosh

TL;DR
This paper introduces TransDocAnalyser, an end-to-end framework utilizing Transformer-based models for offline analysis of semi-structured handwritten legal documents, achieving state-of-the-art results on a new challenging FIR dataset.
Contribution
The paper presents the first semi-structured handwritten legal document dataset and a novel Transformer-based framework for localizing, labeling, and recognizing form fields in such documents.
Findings
Achieved state-of-the-art accuracy on the FIR dataset.
Outperformed existing models in handwritten semi-structured document analysis.
Demonstrated effectiveness of domain-specific tokenization and post-correction methods.
Abstract
State-of-the-art offline Optical Character Recognition (OCR) frameworks perform poorly on semi-structured handwritten domain-specific documents due to their inability to localize and label form fields with domain-specific semantics. Existing techniques for semi-structured document analysis have primarily used datasets comprising invoices, purchase orders, receipts, and identity-card documents for benchmarking. In this work, we build the first semi-structured document analysis dataset in the legal domain by collecting a large number of First Information Report (FIR) documents from several police stations in India. This dataset, which we call the FIR dataset, is more challenging than most existing document analysis datasets, since it combines a wide variety of handwritten text with printed text. We also propose an end-to-end framework for offline processing of handwritten semi-structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Hand Gesture Recognition Systems · Vehicle License Plate Recognition
