CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi,, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer

TL;DR
CAMEL-Bench is a comprehensive Arabic multimodal model benchmark covering diverse domains, designed to evaluate and improve LMMs for Arabic visual reasoning tasks, with open-source tools for community use.
Contribution
This work introduces the first large-scale, multi-domain Arabic LMM benchmark with manually verified questions, filling a significant gap in multilingual multimodal evaluation.
Findings
GPT-4o scored 62% overall, indicating room for improvement.
Open-source models lag behind closed-source counterparts.
Benchmark covers 8 domains with 29,036 questions.
Abstract
Recent years have witnessed a significant interest in developing large multimodal models (LMMs) capable of performing various visual reasoning and understanding tasks. This has led to the introduction of multiple LMM benchmarks to evaluate LMMs on different tasks. However, most existing LMM evaluation benchmarks are predominantly English-centric. In this work, we develop a comprehensive LMM evaluation benchmark for the Arabic language to represent a large population of over 400 million speakers. The proposed benchmark, named CAMEL-Bench, comprises eight diverse domains and 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding to evaluate broad scenario generalizability. Our CAMEL-Bench comprises around 29,036 questions that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Multi-Head Attention · Softmax · Adam
