Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
Yinglian Zhu, Haiyang Yu, Qizao Wang, Wei Lu, Xiangyang Xue, Bin Li

TL;DR
This paper introduces Hi-GITA, a hierarchical multi-granularity image-text aligning framework for zero-shot Chinese character recognition, leveraging detailed semantic information to improve accuracy significantly over existing methods.
Contribution
The paper proposes a novel hierarchical multi-granularity encoding and contrastive alignment approach for zero-shot CCR, capturing detailed semantic cues at multiple levels.
Findings
Achieves about 20% accuracy improvement in handwritten character zero-shot recognition.
Outperforms existing zero-shot CCR methods significantly.
Demonstrates effectiveness of multi-granularity semantic modeling.
Abstract
Chinese Character Recognition (CCR) is a fundamental technology for intelligent document processing. Unlike Latin characters, Chinese characters exhibit unique spatial structures and compositional rules, allowing for the use of fine-grained semantic information in representation. However, existing approaches are usually based on auto-regressive as well as edit distance post-process and typically rely on a single-level character representation. In this paper, we propose a Hierarchical Multi-Granularity Image-Text Aligning (Hi-GITA) framework based on a contrastive paradigm. To leverage the abundant fine-grained semantic information of Chinese characters, we propose multi-granularity encoders on both image and text sides. Specifically, the Image Multi-Granularity Encoder extracts hierarchical image representations from character images, capturing semantic cues from localized strokes to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies
