Glyph-aware Embedding of Chinese Characters
Falcon Z. Dai, Zheng Cai

TL;DR
This paper introduces a novel glyph-aware embedding method for Chinese characters that leverages CNNs to incorporate visual glyph information, improving NLP task performance by capturing semantic and syntactic cues.
Contribution
It proposes a new character embedding approach that explicitly models Chinese glyphs using CNNs, integrating visual structure into NLP representations.
Findings
Improved performance in Chinese language modeling
Enhanced accuracy in word segmentation tasks
Effective encoding of semantic and syntactic information
Abstract
Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character's glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each character's task-relevant semantic and syntactic information in the character-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications
