Harnessing Multilingual Resources to Question Answering in Arabic
Khalid Alnajjar, Mika H\"am\"al\"ainen

TL;DR
This paper improves Arabic question answering by leveraging multilingual BERT and domain-specific Arabic corpora to augment training data and enhance answer prediction accuracy.
Contribution
It introduces a two-step BERT-based approach that combines multilingual data augmentation and candidate answer ranking for better Arabic QA performance.
Findings
Enhanced answer prediction accuracy with multilingual BERT.
Effective use of domain-specific Arabic corpus for training.
Improved ranking of candidate answers.
Abstract
The goal of the paper is to predict answers to questions given a passage of Qur'an. The answers are always found in the passage, so the task of the model is to predict where an answer starts and where it ends. As the initial data set is rather small for training, we make use of multilingual BERT so that we can augment the training data by using data available for languages other than Arabic. Furthermore, we crawl a large Arabic corpus that is domain specific to religious discourse. Our approach consists of two steps, first we train a BERT model to predict a set of possible answers in a passage. Finally, we use another BERT based model to rank the candidate answers produced by the first BERT model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Attention Dropout · Layer Normalization · Dropout · Dense Connections · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?
