Misspellings in Natural Language Processing: A survey
Gianluca Sperduti, Alejandro Moreo

TL;DR
This survey reviews the challenges posed by misspellings in NLP, discusses recent mitigation strategies, datasets, and the impact on large language models, highlighting safety, ethical issues, and future research directions.
Contribution
It provides a comprehensive overview of misspelling challenges in NLP, summarizes recent advancements, and explores implications for large language models and ethical concerns.
Findings
Data augmentation and character-order agnostic methods improve robustness.
Benchmarks and datasets reveal performance gaps in handling misspellings.
Large language models still struggle with misspelled text, indicating room for improvement.
Abstract
This survey provides an overview of the challenges of misspellings in natural language processing (NLP). While often unintentional, misspellings have become ubiquitous in digital communication, especially with the proliferation of Web 2.0, user-generated content, and informal text mediums such as social media, blogs, and forums. Even if humans can generally interpret misspelled text, NLP models frequently struggle to handle it: this causes a decline in performance in common tasks like text classification and machine translation. In this paper, we reconstruct a history of misspellings as a scientific problem. We then discuss the latest advancements to address the challenge of misspellings in NLP. Main strategies to mitigate the effect of misspellings include data augmentation, double step, character-order agnostic, and tuple-based methods, among others. This survey also examines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
