Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning
Gheith A. Abandah, Ashraf Suyyagh, Mohammed Z. Khedher

TL;DR
This paper introduces a BiLSTM-based method for correcting soft spelling mistakes in Arabic, addressing typographical and orthographic errors at the character level, achieving high accuracy without complex architectures.
Contribution
Proposes a novel BiLSTM approach for Arabic soft spelling correction that handles omission and addition errors efficiently at the character level.
Findings
Achieves 96.4% correction rate on injected errors
Character error rate of 1.28% on real test data
Effective low-resource training technique for spelling correction
Abstract
Soft spelling errors are a class of spelling mistakes that is widespread among native Arabic speakers and foreign learners alike. Some of these errors are typographical in nature. They occur due to orthographic variations of some Arabic letters and the complex rules that dictate their correct usage. Many people forgo these rules, and given the identical phonetic sounds, they often confuse such letters. In this paper, we propose a bidirectional long short-term memory network that corrects this class of errors. We develop, train, evaluate, and compare a set of BiLSTM networks. We approach the spelling correction problem at the character level. We handle Arabic texts from both classical and modern standard Arabic. We treat the problem as a one-to-one sequence transcription problem. Since the soft Arabic errors class encompasses omission and addition mistakes, to preserve the one-to-one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Text Readability and Simplification
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Dropout · Memory Network
