Spelling Correction with Denoising Transformer
Alex Kuznetsov, Hector Urdiales

TL;DR
This paper introduces a transformer-based spelling correction method trained on artificially generated typos that mimic human errors, improving correction accuracy and extending to multiple languages without labeled data.
Contribution
The paper presents a novel typo generation procedure that models human error patterns, enhancing transformer-based spelling correction and enabling multilingual applications without labeled datasets.
Findings
Typo generation method outperforms noise-based approaches
Model successfully applied to multiple languages without labeled data
Improved spelling correction accuracy in practical applications
Abstract
We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search. We show that our approach to typo generation is superior to the widespread practice of adding noise, which ignores human patterns. We also demonstrate how our approach may be extended to resource-scarce settings and train spelling correction models for Arabic, Greek, Russian, and Setswana languages, without using any labeled data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
