KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction
Hengyi Cai, Xingguang Ji, Yonghao Song, Yan Jin, Yang Zhang, Mairgup, Mansur, Xiaofang Zhao

TL;DR
This paper introduces KNPTC, a neural machine translation-based method that effectively corrects pinyin typos by integrating explicit knowledge of user typing behaviors, outperforming existing systems significantly.
Contribution
The paper presents a novel NMT approach that incorporates explicit knowledge of pinyin transition probabilities for typo correction without relying on manual constraints.
Findings
Achieves 32.77% improvement in typo correction accuracy
Effectively learns to correct diverse typos without manual features
Utilizes large-scale real-life datasets for training
Abstract
Chinese pinyin input methods are very important for Chinese language processing. Actually, users may make typos inevitably when they input pinyin. Moreover, pinyin typo correction has become an increasingly important task with the popularity of smartphones and the mobile Internet. How to exploit the knowledge of users typing behaviors and support the typo correction for acronym pinyin remains a challenging problem. To tackle these challenges, we propose KNPTC, a novel approach based on neural machine translation (NMT). In contrast to previous work, KNPTC is able to integrate explicit knowledge into NMT for pinyin typo correction, and is able to learn to correct a variety of typos without the guidance of manually selected constraints or languagespecific features. In this approach, we first obtain the transition probabilities between adjacent letters based on large-scale real-life…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
