IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge

Ali Abdelaal; Mohammed Nader Al Haffar; Mahmoud Fawzi; Walid Magdy

arXiv:2603.23750·cs.CL·April 6, 2026

IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge

Ali Abdelaal, Mohammed Nader Al Haffar, Mahmoud Fawzi, Walid Magdy

PDF

TL;DR

IslamicMMLU is a comprehensive benchmark with over 10,000 questions across Islamic disciplines, used to evaluate and compare the performance of 26 large language models on Islamic knowledge tasks.

Contribution

The paper introduces IslamicMMLU, the first extensive benchmark for assessing LLMs on Islamic knowledge, including a public leaderboard and evaluation of multiple models.

Findings

01

Model accuracy ranged from 39.8% to 93.8%.

02

The Quran track showed the widest performance variation.

03

Arabic-specific models underperformed compared to frontier models.

Abstract

Large language models are increasingly consulted for Islamic knowledge, yet no comprehensive benchmark evaluates their performance across core Islamic disciplines. We introduce IslamicMMLU, a benchmark of 10,013 multiple-choice questions spanning three tracks: Quran (2,013 questions), Hadith (4,000 questions), and Fiqh (jurisprudence, 4,000 questions). Each track is formed of multiple types of questions to examine LLMs capabilities handling different aspects of Islamic knowledge. The benchmark is used to create the IslamicMMLU public leaderboard for evaluating LLMs, and we initially evaluate 26 LLMs, where their averaged accuracy across the three tracks varied between 39.8% to 93.8% (by Gemini 3 Flash). The Quran track shows the widest span (99.3% to 32.4%), while the Fiqh track includes a novel madhab (Islamic school of jurisprudence) bias detection task revealing variable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.