Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation
Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash

TL;DR
This paper introduces new Transformer-based models for Arabic grammatical error detection and correction, demonstrating improved performance and establishing benchmarks in a morphologically complex language.
Contribution
It presents the first multi-class Arabic GED task, integrates GED info into GEC models, and explores morphological preprocessing, advancing Arabic GEC research.
Findings
Models achieve state-of-the-art results on Arabic GEC datasets.
GED information improves GEC performance.
Morphological preprocessing aids error correction.
Abstract
Grammatical error correction (GEC) is a well-explored problem in English with many existing models and datasets. However, research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity. In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. We also define the task of multi-class Arabic grammatical error detection (GED) and present the first results on multi-class Arabic GED. We show that using GED information as an auxiliary input in GEC models improves GEC performance across three datasets spanning different genres. Moreover, we also investigate the use of contextual morphological preprocessing in aiding GEC systems. Our models achieve SOTA results on two Arabic GEC shared task datasets and establish a strong benchmark on a recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CAMeL-Lab/camelbert-msa-qalb14-ged-13model· 274 dl· ♡ 1274 dl♡ 1
- 🤗CAMeL-Lab/camelbert-msa-qalb15-ged-13model· 13 dl· ♡ 113 dl♡ 1
- 🤗CAMeL-Lab/arabart-qalb14-gec-ged-13model· 78 dl· ♡ 378 dl♡ 3
- 🤗CAMeL-Lab/arabart-qalb15-gec-ged-13model· 37 dl· ♡ 237 dl♡ 2
- 🤗CAMeL-Lab/arabart-zaebuc-gec-ged-13model· 7 dl· ♡ 27 dl♡ 2
- 🤗CAMeL-Lab/camelbert-msa-zaebuc-ged-13model· 10 dl· ♡ 310 dl♡ 3
- 🤗CAMeL-Lab/camelbert-msa-zaebuc-ged-43model· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
