AraSpell: A Deep Learning Approach for Arabic Spelling Correction
Mahmoud Salhab, Faisal Abu-Khzam

TL;DR
AraSpell is a deep learning framework for Arabic spelling correction that employs seq2seq models like RNN and Transformer, trained on extensive artificial data, demonstrating significant improvements over baseline error rates.
Contribution
Introduces AraSpell, a novel Arabic spelling correction system using seq2seq models and artificial data generation, with extensive empirical validation.
Findings
Achieved 4.8% WER and 1.11% CER on test data
Significant reduction in error rates compared to baseline
Validated effectiveness on a large dataset of 100K sentences
Abstract
Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Sigmoid Activation · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Dropout · Tanh Activation · Label Smoothing
