Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations
Xiaolei Diao, Daqian Shi, Jian Li, Lida Shi, Mingzhe Yue, Ruihua Qi,, Chuntao Li, Hao Xu

TL;DR
This paper introduces ACCID, a comprehensive ancient Chinese character dataset with radical and character annotations, and proposes a baseline zero-shot OCR method leveraging radical decomposition, addressing the lack of benchmarks and annotations.
Contribution
The paper creates ACCID, a novel dataset with radical-level annotations, and develops a baseline zero-shot OCR approach using radical decomposition and data augmentation techniques.
Findings
ACCID dataset effectively supports zero-shot OCR research.
The baseline method achieves promising results on radical recognition.
Synthetic data augmentation improves OCR model robustness.
Abstract
Optical character recognition (OCR) methods have been applied to diverse tasks, e.g., street view text recognition and document analysis. Recently, zero-shot OCR has piqued the interest of the research community because it considers a practical OCR scenario with unbalanced data distribution. However, there is a lack of benchmarks for evaluating such zero-shot methods that apply a divide-and-conquer recognition strategy by decomposing characters into radicals. Meanwhile, radical recognition, as another important OCR task, also lacks radical-level annotation for model training. In this paper, we construct an ancient Chinese character image dataset that contains both radical-level and character-level annotations to satisfy the requirements of the above-mentioned methods, namely, ACCID, where radical-level annotations include radical categories, radical locations, and structural relations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
