CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Sara Ghaboura; Ahmed Heakl; Omkar Thawakar; Ali Alharthi; Ines Riahi,; Abduljalil Saif; Jorma Laaksonen; Fahad S. Khan; Salman Khan; Rao M. Anwer

arXiv:2410.18976·cs.CV·October 25, 2024

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi,, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

CAMEL-Bench is a comprehensive Arabic multimodal model benchmark covering diverse domains, designed to evaluate and improve LMMs for Arabic visual reasoning tasks, with open-source tools for community use.

Contribution

This work introduces the first large-scale, multi-domain Arabic LMM benchmark with manually verified questions, filling a significant gap in multilingual multimodal evaluation.

Findings

01

GPT-4o scored 62% overall, indicating room for improvement.

02

Open-source models lag behind closed-source counterparts.

03

Benchmark covers 8 domains with 29,036 questions.

Abstract

Recent years have witnessed a significant interest in developing large multimodal models (LMMs) capable of performing various visual reasoning and understanding tasks. This has led to the introduction of multiple LMM benchmarks to evaluate LMMs on different tasks. However, most existing LMM evaluation benchmarks are predominantly English-centric. In this work, we develop a comprehensive LMM evaluation benchmark for the Arabic language to represent a large population of over 400 million speakers. The proposed benchmark, named CAMEL-Bench, comprises eight diverse domains and 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding to evaluate broad scenario generalizability. Our CAMEL-Bench comprises around 29,036 questions that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbzuai-oryx/CAMEL-Bench
noneOfficial

Datasets

ahmedheakl/arabic_mmmu
dataset· 71 dl
71 dl

Videos

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Multi-Head Attention · Softmax · Adam