MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
Adil Bahaj, Mounir Ghogho

TL;DR
MizanQA is a new benchmark dataset for evaluating large language models on Moroccan legal questions, highlighting challenges in low-resource, complex linguistic and legal contexts, and revealing significant performance gaps.
Contribution
The paper introduces MizanQA, a comprehensive Moroccan legal question answering benchmark dataset for LLM evaluation, emphasizing the need for culturally and domain-specific model development.
Findings
Multilingual and Arabic LLMs perform poorly on MizanQA.
The dataset captures complex legal reasoning in Arabic and French.
Results show significant gaps in current LLM capabilities for legal NLP.
Abstract
The rapid advancement of large language models (LLMs) has significantly propelled progress in natural language processing (NLP). However, their effectiveness in specialized, low-resource domains-such as Arabic legal contexts-remains limited. This paper introduces MizanQA (pronounced Mizan, meaning "scale" in Arabic, a universal symbol of justice), a benchmark designed to evaluate LLMs on Moroccan legal question answering (QA) tasks, characterised by rich linguistic and legal complexity. The dataset draws on Modern Standard Arabic, Islamic Maliki jurisprudence, Moroccan customary law, and French legal influences. Comprising over 1,700 multiple-choice questions, including multi-answer formats, MizanQA captures the nuances of authentic legal reasoning. Benchmarking experiments with multilingual and Arabic-focused LLMs reveal substantial performance gaps, highlighting the need for tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
