LMSpell: Neural Spell Checking for Low-Resource Languages

Akesh Gunathilake; Nadil Karunarathna; Tharusha Bandaranayake; Nisansa de Silva; Surangika Ranathunga

arXiv:2512.05414·cs.CL·December 12, 2025

LMSpell: Neural Spell Checking for Low-Resource Languages

Akesh Gunathilake, Nadil Karunarathna, Tharusha Bandaranayake, Nisansa de Silva, Surangika Ranathunga

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of pretrained language models for spell correction in low-resource languages, revealing that large language models outperform others with sufficient data, and introduces LMSpell, a toolkit for this task.

Contribution

It provides the first empirical comparison of PLMs for spell correction in low-resource languages and releases LMSpell, a practical toolkit with evaluation features.

Findings

01

LLMs outperform encoder-based models with large datasets

02

Performance holds even for languages not pre-trained on LLMs

03

LMSpell includes evaluation to address hallucination issues

Abstract

Spell correction is still a challenging problem for low-resource languages (LRLs). While pretrained language models (PLMs) have been employed for spell correction, their use is still limited to a handful of languages, and there has been no proper comparison across PLMs. We present the first empirical study on the effectiveness of PLMs for spell correction, which includes LRLs. We find that Large Language Models (LLMs) outperform their counterparts (encoder-based and encoder-decoder) when the fine-tuning dataset is large. This observation holds even in languages for which the LLM is not pre-trained. We release LMSpell, an easy- to use spell correction toolkit across PLMs. It includes an evaluation function that compensates for the hallucination of LLMs. Further, we present a case study with Sinhala to shed light on the plight of spell correction for LRLs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · ICT in Developing Communities