From Press to Pixels: Evolving Urdu Text Recognition
Samee Arif, Sualeha Farid

TL;DR
This paper compares traditional OCR and LLMs for Urdu newspaper text recognition, introducing a new dataset and showing LLMs' effectiveness with limited data, while addressing layout and image quality challenges.
Contribution
It introduces the Urdu Newspaper Benchmark (UNB) dataset and demonstrates the effectiveness of fine-tuned LLMs for Urdu OCR in low-resource settings.
Findings
Gemini-2.5-Pro achieves WER 0.133 on UNB
Fine-tuning GPT-4o improves WER by 6.13% with limited samples
Super-resolution enhances OCR accuracy by 50%
Abstract
This paper presents a comparative analysis of Large Language Models (LLMs) and traditional Optical Character Recognition (OCR) systems on Urdu newspapers, addressing challenges posed by complex multi-column layouts, low-resolution scans, and the stylistic variability of the Nastaliq script. To handle these challenges, we fine-tune YOLOv11x models for article- and column-level text block extraction and train a SwinIR-based super-resolution module that enhances image quality for downstream text recognition, improving accuracy by an average of 50%. We further introduce the Urdu Newspaper Benchmark (UNB), a manually annotated dataset for Urdu OCR comprising 829 paragraph images with a total of 9,982 sentences. Using UNB and the OpenITI corpus, we conduct a systematic comparison between traditional CNN+RNN-based OCR systems and modern LLMs, presenting detailed insertion, deletion, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Softmax · Cosine Annealing · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay · Dropout
