Improving Natural Language Inference in Arabic using Transformer Models   and Linguistically Informed Pre-Training

Mohammad Majd Saad Al Deen; Maren Pielka; J\"orn Hees; Bouthaina; Soulef Abdou; Rafet Sifa

arXiv:2307.14666·cs.CL·July 28, 2023·1 cites

Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training

Mohammad Majd Saad Al Deen, Maren Pielka, J\"orn Hees, Bouthaina, Soulef Abdou, Rafet Sifa

PDF

Open Access 1 Repo

TL;DR

This paper enhances Arabic Natural Language Inference by creating a dedicated dataset and applying linguistically informed pre-training with transformer models, achieving competitive results in a resource-scarce language.

Contribution

It introduces a new Arabic NLI dataset and demonstrates the effectiveness of linguistically informed pre-training, including multi-task learning, for improving model performance.

Findings

01

Language-specific models outperform multilingual ones with NER pre-training

02

Linguistically informed pre-training improves NLI accuracy in Arabic

03

First large-scale evaluation of Arabic NLI with multi-task pre-training

Abstract

This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fraunhofer-iais/arabic_nlp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsFocus