Arabic Spelling Correction using Supervised Learning

Youssef Hassan; Mohamed Aly; Amir Atiya

arXiv:1409.8309·cs.LG·October 1, 2014

Arabic Spelling Correction using Supervised Learning

Youssef Hassan, Mohamed Aly, Amir Atiya

PDF

TL;DR

This paper presents a supervised learning approach for Arabic spelling correction using the QALB corpus, focusing on the most common error types and achieving competitive F1 scores in shared tasks.

Contribution

It introduces a multi-model system tailored to Arabic spelling errors and demonstrates its effectiveness on a large annotated corpus and shared task evaluation.

Findings

01

Achieved an F1 score of 0.58 on the development set.

02

Participated in QALB 2014 shared task, ranking sixth with an F1 of 0.6.

03

Focused on the four most frequent error types in Arabic spelling correction.

Abstract

In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute more than 90% of the spelling errors in the corpus. The proposed system has many models to address each error type on its own and then integrating all the models to provide an efficient and robust system that achieves an overall recall of 0.59, precision of 0.58 and F1 score of 0.58 including all the error types on the development set. Our system participated in the QALB 2014 shared task "Automatic Arabic Error Correction" and achieved an F1 score of 0.6, earning the sixth place out of nine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.