A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance
Amirreza Naziri, Hossein Zeinali

TL;DR
This paper presents a novel method combining BERT and Levenshtein distance to effectively identify and correct various spelling errors in Persian text, demonstrating superior performance over existing systems.
Contribution
It introduces a combined approach leveraging BERT and Levenshtein distance for improved spelling correction, with a comprehensive dataset for Persian language errors.
Findings
High accuracy in spelling correction
Outperforms existing Persian spelling correction systems
Effective handling of non-word and real-word errors
Abstract
Writing, as an omnipresent form of human communication, permeates nearly every aspect of contemporary life. Consequently, inaccuracies or errors in written communication can lead to profound consequences, ranging from financial losses to potentially life-threatening situations. Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors. This research aims to identify and rectify diverse spelling errors in text using neural networks, specifically leveraging the Bidirectional Encoder Representations from Transformers (BERT) masked language model. To achieve this goal, we compiled a comprehensive dataset encompassing both non-real-word and real-word errors after categorizing different types of spelling mistakes. Subsequently, multiple pre-trained BERT models were employed. To ensure optimal performance in correcting misspelling errors, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections · Multi-Head Attention · Residual Connection · Dropout · WordPiece
