Misspellings in Natural Language Processing: A survey

Gianluca Sperduti; Alejandro Moreo

arXiv:2501.16836·cs.CL·October 27, 2025

Misspellings in Natural Language Processing: A survey

Gianluca Sperduti, Alejandro Moreo

PDF

Open Access

TL;DR

This survey reviews the challenges posed by misspellings in NLP, discusses recent mitigation strategies, datasets, and the impact on large language models, highlighting safety, ethical issues, and future research directions.

Contribution

It provides a comprehensive overview of misspelling challenges in NLP, summarizes recent advancements, and explores implications for large language models and ethical concerns.

Findings

01

Data augmentation and character-order agnostic methods improve robustness.

02

Benchmarks and datasets reveal performance gaps in handling misspellings.

03

Large language models still struggle with misspelled text, indicating room for improvement.

Abstract

This survey provides an overview of the challenges of misspellings in natural language processing (NLP). While often unintentional, misspellings have become ubiquitous in digital communication, especially with the proliferation of Web 2.0, user-generated content, and informal text mediums such as social media, blogs, and forums. Even if humans can generally interpret misspelled text, NLP models frequently struggle to handle it: this causes a decline in performance in common tasks like text classification and machine translation. In this paper, we reconstruct a history of misspellings as a scientific problem. We then discuss the latest advancements to address the challenge of misspellings in NLP. Main strategies to mitigate the effect of misspellings include data augmentation, double step, character-order agnostic, and tuple-based methods, among others. This survey also examines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling