Spelling Correction with Denoising Transformer

Alex Kuznetsov; Hector Urdiales

arXiv:2105.05977·cs.CL·May 14, 2021·1 cites

Spelling Correction with Denoising Transformer

Alex Kuznetsov, Hector Urdiales

PDF

Open Access

TL;DR

This paper introduces a transformer-based spelling correction method trained on artificially generated typos that mimic human errors, improving correction accuracy and extending to multiple languages without labeled data.

Contribution

The paper presents a novel typo generation procedure that models human error patterns, enhancing transformer-based spelling correction and enabling multilingual applications without labeled datasets.

Findings

01

Typo generation method outperforms noise-based approaches

02

Model successfully applied to multiple languages without labeled data

03

Improved spelling correction accuracy in practical applications

Abstract

We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search. We show that our approach to typo generation is superior to the widespread practice of adding noise, which ignores human patterns. We also demonstrate how our approach may be extended to resource-scarce settings and train spelling correction models for Arabic, Greek, Russian, and Setswana languages, without using any labeled data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression