Automatic Error Type Annotation for Arabic

Riadh Belkebir; Nizar Habash

arXiv:2109.08068·cs.CL·September 17, 2021

Automatic Error Type Annotation for Arabic

Riadh Belkebir, Nizar Habash

PDF

2 Repos

TL;DR

ARETA is an unsupervised system for automatically annotating error types in Arabic, addressing morphological complexity and aiding grammatical error analysis with high accuracy.

Contribution

It introduces a novel unsupervised approach for Arabic error annotation based on a modified error taxonomy and demonstrates its effectiveness and practical utility.

Findings

01

Achieved 85.8% F1 score on error annotation

02

Provided useful insights into Arabic grammatical errors

03

Demonstrated applicability to real-world error correction submissions

Abstract

We present ARETA, an automatic error type annotation system for Modern Standard Arabic. We design ARETA to address Arabic's morphological richness and orthographic ambiguity. We base our error taxonomy on the Arabic Learner Corpus (ALC) Error Tagset with some modifications. ARETA achieves a performance of 85.8% (micro average F1 score) on a manually annotated blind test portion of ALC. We also demonstrate ARETA's usability by applying it to a number of submissions from the QALB 2014 shared task for Arabic grammatical error correction. The resulting analyses give helpful insights on the strengths and weaknesses of different submissions, which is more useful than the opaque M2 scoring metrics used in the shared task. ARETA employs a large Arabic morphological analyzer, but is completely unsupervised otherwise. We make ARETA publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest