Glyce: Glyph-vectors for Chinese Character Representations
Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li,, Qinghong Han, Xiaofei Sun, Jiwei Li

TL;DR
Glyce introduces glyph-vectors for Chinese characters by leveraging historical scripts, specialized CNNs, and multi-task learning, significantly improving performance across various Chinese NLP tasks.
Contribution
The paper presents a novel glyph-based representation method for Chinese characters, incorporating historical scripts and tailored CNNs, achieving state-of-the-art results in multiple NLP tasks.
Findings
Outperforms ID-based models in Chinese NLP tasks
Achieves new state-of-the-art results in NER, CWS, POS, and text classification
Demonstrates the effectiveness of historical script data and multi-task learning
Abstract
It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model's ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
