Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
Gagan Bhatia, El Moatez Billah Nagoudi, Fakhraddin Alwajih, Muhammad, Abdul-Mageed

TL;DR
Qalam is a new multimodal foundation model for Arabic OCR and HWR that achieves state-of-the-art accuracy, effectively handles diacritics, and processes high-resolution inputs, advancing Arabic script recognition technology.
Contribution
Introduces Qalam, a novel multimodal model based on SwinV2 and RoBERTa, specifically designed for Arabic OCR and HWR, with superior performance and capabilities.
Findings
Achieves 0.80% WER in HWR and 1.18% in OCR tasks.
Trained on over 4.5 million images and 60k synthetic image-text pairs.
Demonstrates exceptional handling of Arabic diacritics and high-resolution inputs.
Abstract
Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Hand Gesture Recognition Systems · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Adam · Dropout · Layer Normalization · Linear Layer · Weight Decay
