DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
Shubham Patle, Sara Ghaboura, Hania Tariq, Mohammad Usman Khan, Omkar Thawakar, Rao Muhammad Anwer, Salman Khan

TL;DR
DuwatBench is a new benchmark dataset designed to evaluate multimodal models on Arabic calligraphy, addressing challenges in recognizing artistic and stylized Arabic script and fostering progress in culturally grounded AI research.
Contribution
The paper introduces DuwatBench, a comprehensive Arabic calligraphy dataset with annotations, enabling evaluation of models on complex artistic scripts and promoting inclusive AI development.
Findings
Multimodal models perform well on clean text but struggle with calligraphic variations.
DuwatBench reveals challenges in visual-text alignment for artistic Arabic scripts.
Public dataset and evaluation tools are now available for further research.
Abstract
Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. Although multimodal models have advanced across languages, their ability to process Arabic script, especially in artistic and stylized calligraphic forms, remains largely unexplored. To address this gap, we present DuwatBench, a benchmark of 1,272 curated samples containing about 1,475 unique words across six classical and modern calligraphic styles, each paired with sentence-level detection annotations. The dataset reflects real-world challenges in Arabic writing, such as complex stroke patterns, dense ligatures, and stylistic variations that often challenge standard text recognition systems. Using DuwatBench, we evaluated 13 leading Arabic and multilingual multimodal models and showed that while they perform well on clean text, they struggle with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Handwritten Text Recognition Techniques
