From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents

Mohammad Amaan Sayeed; Mohammed Talha Alam; Raza Imam; Shahab Saquib Sohail; Amir Hussain

arXiv:2506.15911·cs.CL·June 24, 2025

From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents

Mohammad Amaan Sayeed, Mohammed Talha Alam, Raza Imam, Shahab Saquib Sohail, Amir Hussain

PDF

Open Access

TL;DR

This paper introduces Tibbe-AG, a comprehensive evaluation pipeline for Islamic medical knowledge in LLMs, demonstrating that retrieval and self-critique significantly enhance accuracy and cultural sensitivity in medical QA.

Contribution

It develops a novel evaluation framework combining retrieval, self-critique, and agentic judgment to validate culturally grounded medical responses in LLMs.

Findings

01

Retrieval improves factual accuracy by 13%.

02

Agentic prompts add an additional 10% accuracy gain.

03

Blending classical texts with retrieval and self-evaluation enhances reliability and cultural sensitivity.

Abstract

Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare

MethodsFocus