Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems

Ali Faraz; Raja Kolla; Ashish Kulkarni; Shubham Agarwal

arXiv:2602.16430·cs.CV·February 19, 2026

Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems

Ali Faraz, Raja Kolla, Ashish Kulkarni, Shubham Agarwal

PDF

Open Access 2 Models

TL;DR

This paper explores training strategies for multilingual OCR systems tailored for India, demonstrating that fine-tuning existing models yields better accuracy and efficiency, and introduces specialized models for government documents.

Contribution

It compares two training approaches for multilingual OCR, finds fine-tuning superior, and presents new models achieving state-of-the-art results and practical deployment metrics.

Findings

01

Fine-tuning yields better accuracy-latency trade-offs.

02

Chitrapathak-2 achieves 3-6x speedup and SOTA in Telugu.

03

Parichay extracts structured data with 89.8% accuracy.

Abstract

Designing Optical Character Recognition (OCR) systems for India requires balancing linguistic diversity, document heterogeneity, and deployment constraints. In this paper, we study two training strategies for building multilingual OCR systems with Vision-Language Models through the Chitrapathak series. We first follow a popular multimodal approach, pairing a generic vision encoder with a strong multilingual language model and training the system end-to-end for OCR. Alternatively, we explore fine-tuning an existing OCR model, despite not being trained for the target languages. Through extensive evaluation on multilingual Indic OCR benchmarks and deployment-oriented metrics, we find that the second strategy consistently achieves better accuracy-latency trade-offs. Chitrapathak-2 achieves 3-6x speedup over its predecessor with being state-of-the-art (SOTA) in Telugu (6.69 char ANLS) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Topic Modeling · Speech Recognition and Synthesis