Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith
Somaya Eltanbouly, Samer Rashwani

TL;DR
This paper introduces a retrieval-augmented generation framework that leverages the Doha Historical Dictionary to improve the understanding of complex Arabic religious texts by large language models, significantly enhancing their accuracy.
Contribution
It presents a novel retrieval-augmented approach using diachronic lexicographic knowledge to improve LLM performance on historical Arabic texts, which was not addressed in prior systems.
Findings
Accuracy of Arabic LLMs improved to over 85%
High agreement (kappa = 0.87) in automated evaluation
Identifies linguistic challenges like diacritics and compound expressions
Abstract
Large language models (LLMs) have achieved remarkable progress in many language tasks, yet they continue to struggle with complex historical and religious Arabic texts such as the Quran and Hadith. To address this limitation, we develop a retrieval-augmented generation (RAG) framework grounded in diachronic lexicographic knowledge. Unlike prior RAG systems that rely on general-purpose corpora, our approach retrieves evidence from the Doha Historical Dictionary of Arabic (DHDA), a large-scale resource documenting the historical development of Arabic vocabulary. The proposed pipeline combines hybrid retrieval with an intent-based routing mechanism to provide LLMs with precise, contextually relevant historical information. Our experiments show that this approach improves the accuracy of Arabic-native LLMs, including Fanar and ALLaM, to over 85\%, substantially reducing the performance gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
