Hierarchical Character Tagger for Short Text Spelling Error Correction
Mengyi Gao, Canran Xu, Peng Shi

TL;DR
This paper introduces HCTagger, a hierarchical character tagging model that efficiently corrects short text spelling errors using a character-level language model and a multi-task decoding approach, outperforming existing methods in speed and accuracy.
Contribution
The paper presents a novel hierarchical character tagging model that reduces label space and improves decoding speed for spelling correction in short texts.
Findings
HCTagger achieves higher accuracy than existing models.
HCTagger is significantly faster in inference.
The hierarchical multi-task decoding effectively handles long-tail label distribution.
Abstract
State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Attention Dropout · Weight Decay · Linear Warmup With Linear Decay · Long Short-Term Memory
