Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
Shubham Kumar Nigam, Parjanya Aditya Shukla, Noel Shallum, Arnab Bhattacharya

TL;DR
This paper compares traditional OCR plus machine translation pipelines with vision-language models for translating handwritten Marathi legal documents, aiming to improve digitization and accessibility of legal records in low-resource settings.
Contribution
It introduces and evaluates a unified vision-language model approach for direct translation of handwritten legal documents, demonstrating potential advantages over traditional pipelines.
Findings
Vision-language models can directly translate handwritten images effectively.
Traditional OCR-MT pipelines are less efficient for low-resource languages.
End-to-end models show promise for legal document digitization in India.
Abstract
Handwritten text recognition (HTR) and machine translation continue to pose significant challenges, particularly for low-resource languages like Marathi, which lack large digitized corpora and exhibit high variability in handwriting styles. The conventional approach to address this involves a two-stage pipeline: an OCR system extracts text from handwritten images, which is then translated into the target language using a machine translation model. In this work, we explore and compare the performance of traditional OCR-MT pipelines with Vision Large Language Models that aim to unify these stages and directly translate handwritten text images in a single, end-to-end step. Our motivation is grounded in the urgent need for scalable, accurate translation systems to digitize legal records such as FIRs, charge sheets, and witness statements in India's district and high courts. We evaluate both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Multimodal Machine Learning Applications
