Improving Chinese Character Representation with Formation Tree
Yang Hong, Yinfei Li, Xiaojun Qiao, Rui Li, Junsong Zhang

TL;DR
This paper introduces Formation Tree-CLIP, a novel model that uses formation trees and a dedicated encoder to improve Chinese character recognition, especially for unseen characters, by leveraging inherent tree structures and efficient training techniques.
Contribution
The paper proposes a formation tree-based representation and encoder for Chinese characters, significantly improving recognition accuracy and training efficiency over previous radical-based sequence methods.
Findings
Enhanced recognition of unseen characters.
Training speed increased by over 2 times.
Better alignment with character inherent properties.
Abstract
Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). This model utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
