PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
Ijazul Haq, Yingjie Zhang, Irfan Ali Khan

TL;DR
This paper benchmarks large multimodal models on Pashto OCR using a newly created synthetic dataset, revealing current model capabilities and limitations for low-resource, cursive scripts like Pashto.
Contribution
Introduces PsOCR, a large synthetic dataset for Pashto OCR, and evaluates multiple LMMs, providing insights into their performance on low-resource cursive scripts.
Findings
Gemini achieves the best OCR performance among all models.
Qwen-7B is the top open-source model for Pashto OCR.
Current LMMs have limitations in handling low-resource cursive scripts.
Abstract
This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images annotated with bounding boxes at word, line, and document levels, suitable for training and evaluating models based on different architectures, including Convolutional Neural Networks (CNNs) and Transformers. PsOCR covers variations across 1,000 unique font families, colors, image sizes, and layouts. A benchmark subset of 10K images was selected to evaluate the performance of several LMMs, including seven open-source models: DeepSeek's Janus, InternVL, MiniCPM, Florence, and Qwen (3B and 7B), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis · Handwritten Text Recognition Techniques
MethodsFlorence
