Optimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models
Mohamed Basem, Islam Oshallah, Baraa Hikal, Ali Hamdi, and Ammar, Mohamed

TL;DR
This paper enhances Quran passage retrieval by expanding the QA dataset and fine-tuning transformer models, significantly improving accuracy and handling of no-answer cases in the Holy Qur'an question-answering system.
Contribution
It updates and diversifies the Quran QA dataset and demonstrates improved model performance through fine-tuning transformer models, especially AraBERT-base.
Findings
MAP@10 improved by 63% with AraBERT-base
MRR increased by 59% after dataset expansion
75% success rate in no-answer cases
Abstract
Understanding the deep meanings of the Qur'an and bridging the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur'an. The Qur'an QA 2023 shared task dataset had a limited number of questions with weak model retrieval. To address this challenge, this work updated the original dataset and improved the model accuracy. The original dataset, which contains 251 questions, was reviewed and expanded to 629 questions with question diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Is All You Need · Dense Connections · Multi-Head Attention · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · WordPiece
