AraSpell: A Deep Learning Approach for Arabic Spelling Correction

Mahmoud Salhab; Faisal Abu-Khzam

arXiv:2405.06981·cs.CL·May 14, 2024·2 cites

AraSpell: A Deep Learning Approach for Arabic Spelling Correction

Mahmoud Salhab, Faisal Abu-Khzam

PDF

Open Access 1 Repo

TL;DR

AraSpell is a deep learning framework for Arabic spelling correction that employs seq2seq models like RNN and Transformer, trained on extensive artificial data, demonstrating significant improvements over baseline error rates.

Contribution

Introduces AraSpell, a novel Arabic spelling correction system using seq2seq models and artificial data generation, with extensive empirical validation.

Findings

01

Achieved 4.8% WER and 1.11% CER on test data

02

Significant reduction in error rates compared to baseline

03

Validated effectiveness on a large dataset of 100K sentences

Abstract

Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msalhab96/AraSpell
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Sigmoid Activation · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Dropout · Tanh Activation · Label Smoothing