From RAG to Agentic RAG for Faithful Islamic Question Answering

Gagan Bhatia; Hamdy Mubarak; Mustafa Jarrar; George Mikros; Fadi Zaraket; Mahmoud Alhirthani; Mutaz Al-Khatib; Logan Cochrane; Kareem Darwish; Rashid Yahiaoui; Firoj Alam

arXiv:2601.07528·cs.CL·January 13, 2026

From RAG to Agentic RAG for Faithful Islamic Question Answering

Gagan Bhatia, Hamdy Mubarak, Mustafa Jarrar, George Mikros, Fadi Zaraket, Mahmoud Alhirthani, Mutaz Al-Khatib, Logan Cochrane, Kareem Darwish, Rashid Yahiaoui, Firoj Alam

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new benchmark and an agentic RAG framework for faithful Islamic question answering, emphasizing hallucination detection, abstention, and grounded evidence retrieval to improve accuracy and robustness.

Contribution

It presents ISLAMICFAITHQA, a bilingual benchmark for Islamic QA, and develops an agentic RAG approach with structured tool calls for iterative evidence seeking and answer revision.

Findings

01

Retrieval improves answer correctness.

02

Agentic RAG outperforms standard RAG in accuracy.

03

Framework achieves state-of-the-art performance with small models.

Abstract

LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucinations and whether models appropriately abstain when evidence is lacking. To shed a light on this aspect we introduce ISLAMICFAITHQA, a 3,810-item bilingual (Arabic/English) generative benchmark with atomic single-gold answers, which enables direct measurement of hallucination and abstention. We additionally developed an end-to-end grounded Islamic modelling suite consisting of (i) 25K Arabic text-grounded SFT reasoning pairs, (ii) 5K bilingual preference samples for reward-guided alignment, and (iii) a verse-level Qur'an retrieval corpus of $\sim$ 6k atomic verses (ayat). Building on these resources, we develop an agentic Quran-grounding framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

QCRI/IslamicFaithQA
dataset· 18 dl
18 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification