Cross-Language Approach for Quranic QA
Islam Oshallah, Mohamed Basem, Ali Hamdi, Ammar Mohammed

TL;DR
This paper introduces a cross-language approach combining dataset augmentation and language model fine-tuning to improve Quranic question answering, addressing linguistic disparities and limited data challenges.
Contribution
It proposes a novel cross-language methodology using machine translation and pre-trained models to enhance Quranic QA performance, which is a new approach in this domain.
Findings
RoBERTa-Base achieved MAP@10 of 0.34 and MRR of 0.52.
DeBERTa-v3-Base excelled in Recall@10 and Precision@10.
The approach significantly improves model performance in Quranic QA.
Abstract
Question answering systems face critical limitations in languages with limited resources and scarce data, making the development of robust models especially challenging. The Quranic QA system holds significant importance as it facilitates a deeper understanding of the Quran, a Holy text for over a billion people worldwide. However, these systems face unique challenges, including the linguistic disparity between questions written in Modern Standard Arabic and answers found in Quranic verses written in Classical Arabic, and the small size of existing datasets, which further restricts model performance. To address these challenges, we adopt a cross-language approach by (1) Dataset Augmentation: expanding and enriching the dataset through machine translation to convert Arabic questions into English, paraphrasing questions to create linguistic diversity, and retrieving answers from an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Islamic Studies
MethodsADaptive gradient method with the OPTimal convergence rate · ALIGN · Flan-T5
