Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters
Yinghui Li, Zishan Xu, Shaoshen Chen, Haojing Huang, Yangning Li, Yong, Jiang, Zhongli Li, Qingyu Zhou, Hai-Tao Zheng, Ying Shen

TL;DR
This paper introduces Visual-C$^3$, a comprehensive Chinese character checking dataset that includes both faked and misspelled characters, aiming to improve real-world writing assistance tools.
Contribution
It presents the first large-scale, human-annotated dataset for Chinese character checking that includes faked characters, and evaluates baseline methods on this challenging dataset.
Findings
Visual-C$^3$ is high-quality and challenging.
Baseline methods show promising results but highlight the difficulty of the task.
The dataset will be publicly available for further research.
Abstract
Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C, a human-annotated Visual Chinese Character Checking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
MethodsFocus
