Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next
Xiaotian Zhang, Yanjun Zheng, Hang Yan, Xipeng Qiu

TL;DR
This paper examines how glyph and phonetic information influence Chinese spell checking models, assesses their generalization capabilities, and proposes a new challenging testing setting to guide future improvements.
Contribution
It provides insights into the use of glyph-phonetic information in CSC and introduces a more rigorous evaluation framework for model generalization.
Findings
Glyph-phonetic information improves spell checking accuracy.
Current models have limited generalization to new error types.
Proposed setting reveals gaps in model robustness.
Abstract
While pre-trained Chinese language models have demonstrated impressive performance on a wide range of NLP tasks, the Chinese Spell Checking (CSC) task remains a challenge. Previous research has explored using information such as glyphs and phonetics to improve the ability to distinguish misspelled characters, with good results. However, the generalization ability of these models is not well understood: it is unclear whether they incorporate glyph-phonetic information and, if so, whether this information is fully utilized. In this paper, we aim to better understand the role of glyph-phonetic information in the CSC task and suggest directions for improvement. Additionally, we propose a new, more challenging, and practical setting for testing the generalizability of CSC models. All code is made publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Macropodus/macbert4mdcspell_v1model· 40k dl· ♡ 240k dl♡ 2
- 🤗Macropodus/macbert4csc_v2model· 8 dl· ♡ 28 dl♡ 2
- 🤗Macropodus/macbert4csc_v1model· 5 dl· ♡ 15 dl♡ 1
- 🤗Macropodus/bert4csc_v1model· 4 dl· ♡ 14 dl♡ 1
- 🤗Macropodus/relm_v1model· 42 dl· ♡ 142 dl♡ 1
- 🤗Macropodus/macbert4mdcspell_v2model· 283 dl· ♡ 6283 dl♡ 6
- 🤗Macropodus/macbert4mdcspell_v3model· 310 dl· ♡ 1310 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
