Optimized Quran Passage Retrieval Using an Expanded QA Dataset and   Fine-Tuned Language Models

Mohamed Basem; Islam Oshallah; Baraa Hikal; Ali Hamdi; and Ammar; Mohamed

arXiv:2412.11431·cs.CL·December 17, 2024

Optimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models

Mohamed Basem, Islam Oshallah, Baraa Hikal, Ali Hamdi, and Ammar, Mohamed

PDF

Open Access

TL;DR

This paper enhances Quran passage retrieval by expanding the QA dataset and fine-tuning transformer models, significantly improving accuracy and handling of no-answer cases in the Holy Qur'an question-answering system.

Contribution

It updates and diversifies the Quran QA dataset and demonstrates improved model performance through fine-tuning transformer models, especially AraBERT-base.

Findings

01

MAP@10 improved by 63% with AraBERT-base

02

MRR increased by 59% after dataset expansion

03

75% success rate in no-answer cases

Abstract

Understanding the deep meanings of the Qur'an and bridging the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur'an. The Qur'an QA 2023 shared task dataset had a limited number of questions with weak model retrieval. To address this challenge, this work updated the original dataset and improved the model accuracy. The original dataset, which contains 251 questions, was reviewed and expanded to 629 questions with question diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Is All You Need · Dense Connections · Multi-Head Attention · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · WordPiece